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ABSTRACT 



An apparatus and method for performing speech signal 
compression, by variable rate coding of frames of digi- 
tized speech samples. The level of speech activity for 
each frame of digitized speech samples is determined 
and an output data packet rate is selected from a set of 
rates based upon the determined level of frame speech 
activity. A lowest rate of the set of rates corresponds to 
a detected minimum level of speech activity, such as 
background noise or pauses in speech, while a highest 
rate corresponds to a detected maximum level of speech 
activity, such as active vocalization. Each frame is then 
coded according to a predetermined coding format for 
the selected rate wherein each rate has a corresponding 
number of bits representative of the coded frame. A 
data packet is provided for each coded frame with each 
output data packet of a bit rate corresponding to the 
selected rate. 

48 Claims, 22 Drawing Sheets 
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speech, are modeled by computing the parameters of a 
VARIABLE RATE VOCODER pitch filter, which essentially models the human vocal 

chords. Finally, these filters must be excited, and this is 
This is a Continuation of application Ser. No. done by determining which one of a number of random 
07/713,661, filed Jun. 11, 1991, now abandoned. 5 excitation waveforms in a codebook results in the clos- 

. ^«n«TTvm at, -rrtn ,vr,rr^ Wrt vr CSt apprOXimatiOH tO tilC Original SpCCCh Wlldl the 

BACKGROUND OF THE INVENTION ^Jtem excites the two filters mentioned above. 

I. Field of the Invention Thus the transmitted parameters relate to three items (1) 
The present invention relates to speech processing. the LPC filter, (2) the pitch filter and (3) the codebook 

Specifically, the present invention relates to a novel and 10 excitation. 

improved method and system for compressing speech Although the use of vocoding techniques further the 
wherein the amount of compression dynamically varies objective in attempting to reduce the amount of infor- 
while minimally impacting the quality of the recon- mation sent over the channel while maintaining quality 
structed speech. Furthermore, since the compressed reconstructed speech, other techniques need be em- 
speech data is intended to be sent over a channel which 15 ployed to achieve further reduction. One technique 
may introduce errors, the method and system of the previously used to reduce the amount of information 
present invention also minimizes the impact of channel sent is voice activity gating. In this technique no infor- 
errors on voice quality. mation is transmitted during pauses in speech. Although 

II. Description of the Related Art this technique achieves the desired result of data reduc- 
Transmission of voice by digital techniques has be- 20 tion, it suffers from several deficiencies. 

come widespread, particularly in long distance and In many cases, the quality of speech is reduced due to 
digital radio telephone applications. This, in turn, has clipping of the initial parts of word. Another problem 
created interest in determining the least amount of in- with gating the channel off during inactivity is that the 
formation which can be sent over the channel which system users perceive the lack of the background noise 
maintains the perceived quality of the reconstructed 25 which normally accompanies speech and rate the qual- 
speech. If speech is transmitted by simply sampling and ity of the channel as lower than a normal telephone call, 
digitizing, a data rate on the order of 64 kilobits per A further problem with activity gating is that occa- 
second (kbps) is required to achieve a speech quality of sional sudden noises in the background may trigger the 
conventional analog telephone. However, through the transmitter when no speech occurs, resulting in annoy- 
use of speech analysis, followed by the appropriate 30 ing bursts of noise at the receiver, 
coding, transmission, and resynthesis at the receiver, a In an attempt to improve the quality of the synthe- 
significant reduction in the data rate can be achieved. sized speech in voice activity gating systems, synthe- 
Devices which employ techniques to compress sized comfort noise is added during the decoding pro- 
voiced speech by extracting parameters that relate to a cess. Although some improvement in quality is 
model of human speech generation are typically called 35 achieved from adding comfort noise, it does not sub- 
vocoders. Such devices are composed of an encoder, stantially improve the overall quality since the comfort 
which analyzes the incoming speech to extract the rele- noise does not model the actual background noise at the 
vant parameters, and a decoder, which resynthesizes encoder. 

the speech using the parameters which it receives over A more preferred technique to accomplish data com- 

the transmission channel. In order to be accurate, the 40 pression, so as to result in a reduction of information 

model must be constantly changing. Thus the speech is that needs to be sent, is to perform variable rate vocod- 

divided into blocks of time, or analysis flames, during ing. Since speech inherently contains periods of silence, 

which the parameters are calculated. The parameters i.e. pauses, the amount of data required to represent 

are then updated for each new frame. these periods can be reduced. Variable rate vocoding 

Of the various classes of speech coders the Code 45 most effectively exploits this fact by reducing the data 
Excited Linear Predictive Coding (CELP), Stochastic rate for these periods of silence. A reduction in the data 
Coding or Vector Excited Speech Coding are of one rate, as opposed to a complete halt in data transmission, 
class. An example of a coding algorithm of this particu- for periods of silence overcomes the problems associ- 
lar class is described in the paper *'A 4.8 kbps Code ated with voice activity gating while facilitating a re- 
Excited Linear Predictive Coder" by Thomas E. Tre- 50 duction in transmitted information, 
main et al., Proceedings of the Mobile Satellite Confer- It is therefore an object of the present invention to 
ence, 1988. provide a novel and improved method and system for 

The function of the vocoder is to compress the digi- compressing speech using a variable rate vocoding 

tized speech signal into a low bit rate signal by remov- technique. 

ing all of the natural redundancies inherent in speech. 55 CTTLrviADV rtt? ~™ TK n;c\mnM 
Speech typically has short term redundancies due pri- SUMMARY OF THE INVENTION 
marily to the filtering operation of the vocal tract, and The present invention implements a vocoding algo- 
long term redundancies due to the excitation of the rithm of the previously mentioned class of speech cod- 
vocal tract by the vocal cords. In a CELP coder, these ers, Code Excited Linear Predictive Coding (CELP), 
operations are modelled by two filters, a short term 60 Stochastic Coding or Vector Excited Speech Coding, 
formant filter and a long term pitch filter. Once these The CELP technique by itself does provide a significant 
redundancies are removed, the resulting residual signal reduction in the amount of data necessary to represent 
can be modelled as white gaussian noise, which also speech in a manner that upon resynthesis results in high 
must be encoded. The basis of this technique is to com- quality speech. As mentioned previously the vocoder 
pute the parameters of a filter, called the LPC filter, 65 parameters are updated for each frame. The vocoder of 
which performs short-term prediction of the speech the present invention provides a variable output data 
waveform using a model of the human vocal tract. In rate by changing the frequency and precision of the 
addition, long-term effects, related to the pitch of the model parameters. 
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The present invention differs most markedly from the depending on the voice activity. The just mentioned 

basic CELP technique by producing a variable output compression factors are cited with reference to a ptlaw 

data rate based on speech activity. The structure is input, with the compression factors higher by a factor 

defined so that the parameters are updated less often, or of 2 for a linear input. Rate determination is made on a 

with less precision, during pauses in speech. This tech* 5 frame by frame basis so as to take full advantage of the 

nique allows for an even greater decrease in the amount voice activity factor. Even though less data is produced 

of information to be transmitted. The phenomenon for pauses in speech, the perceived degradation of the 

which is exploited to reduce the data rate is the voice resynthesized background noise is minimized. Using the 

activity factor, which is the average percentage of time techniques of the present invention, near-toll quality 

a given speaker is actually talking during a con versa- 10 speech can be achieved at a maximum data rate of 8 

tion. For typical two-way telephone conversations, the kbps and an average data rate on the order of 3.5 kbps 

average data rate is reduced by a factor of 2 or more. in normal conversation. 

During pauses in speech, only background noise is Since the present invention enables short pauses in 
being coded by the vocoder. At these times, some of the speech to be detected, a decrease in the effective voice 
parameters relating to the human vocal tract model 15 activity factor is realized. Rate decisions can be made 
need not be transmitted. on a frame by frame basis with no hangover, so the data 
As mentioned previously a prior approach to limiting rate may be lowered for pauses in speech as short as the 
the amount of information transmitted during silence is frame duration, typically 20 msec, in the preferred em- 
called voice activity gating, a technique in which no bodiment. Therefore pauses such as those. between syl- 
information is transmitted during moments of silence. 20 lables may be captured. This technique decreases the 
On the receiving side the period may be filled in with voice activity factor beyond what has traditionally been 
synthesized "comfort noise". In contrast, a variable rate considered, as not only long duration pauses between 
vocoder is continuously transmitting data which in the phrases, but also shorter pauses can be encoded at lower 
preferred embodiment is at rates which range between rates. 

approximately 8 kbps and 1 kbps. A vocoder which 25 Since rate decisions are made on a frame basis, there 

provides a continuous transmission of data eliminates is no clipping of the initial part of the word, such as in 

the need for synthesized "comfort noise", with the cod- a voice activity gating system. Clipping of this nature 

ing of the background noise providing a more natural occurs in voice activity gating system due to a delay 

quality to the resynthesized speech. The present inven- between detection of the speech and a restart in trans- 

tion therefore provides a significant improvement in 30 mission of data. Use of a rate decision based upon each 

resynthesized speech quality over that of voice activity frame results in speech where all transitions have a 

gating by allowing a smooth transition between speech natural sound. 

and background. With the vocoder always transmitting, the speaker's 
The present invention further incorporates a novel ambient background noise will continually be heard on 
technique for masking the occurrence of errors. Be- 35 the receiving end thereby yielding a more natural sound 
cause the data is intended for transmission over a chan- during speech pauses. The present invention thus pro- 
nel that may be noisy, a radio link for example, it must vides a smooth transition to background noise. What 
accommodate errors in the data. Previous techniques the listener hears in the background during speech will 
using channel coding to reduce the number of errors not suddenly change to a synthesized comfort noise 
encountered can provide some success in reducing er- 40 during pauses as in a voice activity gating system, 
rors. However, channel coding alone does not fully Since background noise is continually vocoded for 
provide the level of errors protection necessary to en- transmission, interesting events in the background can 
sure high quality in the reconstructed speech. In the be sent with full clarity. In certain cases the interesting 
variable rate vocoder where vocoding is occurring background noise may even be coded at the highest 
continuously, an error may destroy data relating to 45 rate. Maximum rate coding may occur, for example, 
some interesting speech event, such as the start of a when there is someone talking loudly in the back- 
word or a syllable. A typical problem with linear pre- ground, or if an ambulance drives by a user standing on 
diction coding (LPC) based vocoders, is that errors in a street corner. Constant or slowly varying background 
the parameters relating to the vocal tract model will noise will, however, be encoded at low rates, 
cause sounds which are vaguely human-like, and which 50 The use of variable rate vocoding has the promise of 
may change the sound of the original word enough to increasing the capacity of a Code Division Multiple 
confuse the listener. In the present invention, errors are Access (CDMA) based digital cellular telephone system 
masked to decrease their perceptibility to the listener. by more than a factor of two. CDMA and variable rate 
Error masking thus as implemented in the present in- vocoding are uniquely matched, since, with CDMA, 
vention provides a drastic decrease in the affect of er- 55 the interference between channels drops automatically 
rors on speech intelligibility. as the rate of data transmission over any channel de- 
Because the maximum amount that any parameter creases. In contrast, consider systems in which transmis- 
can change is limited to smaller ranges at low rates, sion slots are assigned, such as TDMA or FDMA. In 
errors in the parameters transmitted at these rates will order for such a system to take advantage of any drop in 
affect speech quality less. Since errors in the different 60 the rate of data transmission, external intervention is 
rates have different perceived effects on speech quality, required to coordinate the reassignment of unused slots 
the transmission system can be optimized to give more to other users. The inherent delay in such a scheme 
protection to the higher rate data. Therefore as an implies that the channel may be reassigned only during 
added feature, the present invention provides a robust- long speech pauses. Therefore, full advantage cannot be 
ness to channel errors. 65 taken of the voice activity factor. However, with exter- 
The present invention in implementing a variable rate nal coordination, variable rate vocoding is useful in 
output version of the CELP algorithm results in speech systems other than CDMA because of the other men- 
compression which dynamically varies from 8:1 to 64:1 tioned reasons. 
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In a CDMA system speech quality can be slightly FIG. 12 is a block diagram of an exemplary LPC 
degraded at times when extra system capacity is de- quantization subsystem; 

sired. Abstractly speaking, the vocoder can be thought FIG. 13 is a block diagram of exemplary LSP inter- 
of as multiple vocoders all operating at different rates polation and LSP to LPC transformation subsystems; 
with different resultant speech qualities. Therefore the 5 FIG. 14 is a block diagram of the adaptive codebook 
speech qualities can be mixed in order to further reduce for the pitch search; 

the average rate of data transmission. Initial experi- FIG. 15 is a block diagram of the encoder' decoder; 
ments show that by mixing full and half rate vocoded FIG. 16 is a block diagram of the pitch search subsys- 
speech, e.g. the maximum allowable data rate is varied tcm; 

on a frame by frame basis between 8 kbps and 4 kbps, 10 FIG. 17 is a block diagram of the codebook search 
the resulting speech has a quality which is better than subsystem; 

half rate variable, 4 kbps maximum, but not as good as FIG - 18 is a block dia S ram of ^ data P ackin S sub " 
full rate variable, 8 kbps maximum. system; 
It is well known that in most telephone conversa- FIG - 19 IS a more detailed functional block diagram 

tions, only one person talks at a time. As an additional 15 of jJl dec ? der L t t 

function for full-duplex telephone links a rate interlock FIGS ; 20a ' 20d m charts illustratmg the decoder 
may be provided. If one direction of the link is transmit- recelved Parameters and subframe decoding data for 
ting at the highest transmission rate, then the other va 2?^ s 0 K L t 5 s; ^ u _ r _*u •« * *: *u J 
direction of the link is forced to transmit at the lowest , n 21a ^ lc ™ c f harts f f\ e f r ^dng the de- 

rate. Aninterlock between the two directions of the link 20 rec f ved P^ameters and subframe decoding data 
can guarantee no grater than 50% average utilization ^^£33"^ of the LSP inverse quanti- 
of each direction of the link. However, when the chan- t 

nel is gated off, such as the case for a rate interlock in fg^x* di fa ter detail of ^ 

activity gating, there* no way fora hstenerto interrupt 2J md automatic gain control . 

the talker to take over the talker role in the conversa- ^ r 

tion. The present invention readily provides the capabil- RIG u i$ ft chaft Ulustrati the ad ti ve brightne ss 
ity of a rate interlock by control signals which set the fl j ter characteristics, 
vocoding rate. 

Finally, it should be noted that by using a variable 3Q DETAILED DESCRIPTION OF THE 

rate vocoding scheme, signalling information can share PREFERRED EMBODIMENT 

the channel with speech data with a very minimal im- In accordance with the present invention, sounds 
pact on speech quality. For example, a high rate frame such ^ speec h and/or background noise are sampled 
may be split into two pieces, half for sending the lower ^ digitized using well known techniques. For exam- 
rate voice data and the other half for the signalling data. 35 p i e tn e analog signal may be converted to a digital 
In the vocoder of the preferred embodiment only a format by the standard 8 bit/jilaw format followed by a 
slight degradation in speech quality between full and ^law/uniform code conversion. In the alternative, the 
half rate vocoded speech is realized. Therefore, the analog signal may be directly converted to digital form 
vocoding of speech at the lower rate for shared trans- m a uniform pulse code modulation (PCM) format, 
mission with other data results in an almost impercepti- 40 Each sample in the preferred embodiment is thus repre- 
ble difference in speech quality to the user. sented by one 16 bit word of data. The samples are 

BRIEF DESCRIPTION OF THE DRAWINGS organized into frames of input data wherein each frame 

is comprised of a predetermined number of samples. In 

The features, objects, and advantages of the present tne exemplary embodiment disclosed herein an 8 kHz 
invention will become more apparent from the detailed 45 sampling rate is considered. Each frame is comprised of 
description set forth below when taken in conjunction 160 samples or of 20 msec, of speech at the 8 kHz sam- 
with the drawings in which like reference characters pifog rate . It should be understood that other sampling 
identify correspondingly throughout and wherein: ra tes and frame sizes may be used. 

FIGS, la-le illustrates in graphical form the vocoder The field of vocoding includes many different tech- 
analysis frames and subframes for various rates; 50 niques for speech coding, one of which is the CELP 

FIGS. 2a-2d are a series of charts illustrating the coding technique. A summary of the CELP coding 
vocoder output bit distribution for various rates; technique is described in the previously mentioned 

FIG. 3 is a generalized block diagram of an exem- paper "A 4.8 kbps Code Excited Linear Predictive 
plary encoder; Coder". The present invention implements a form of the 

FIG. 4 is an encoder flow chart; 55 CELP coding techniques so as to provide a variable 

FIG. 5 is a generalized block diagram of an exem- rate in coded speech data wherein the LPC analysis is 
plary decoder, performed upon a constant number of samples, and the 

FIG. 6 is a decoder flow chart; pitch and codebook searchs are performed on varying 

FIG. 7 is a more detailed functional block diagram of numbers of samples depending upon the transmission 
the encoder; 60 rate. In concept the CELP coding techniques as applied 

FIG. 8 is a block diagram of an exemplary Hamming to the present invention are discussed with reference to 
window and autocorrelation subsystems; FIGS. 3 and 5. 

FIG. 9 is a is a block diagram of an exemplary rate In the preferred embodiment of the present invention, 
determination subsystem; the speech analysis frames are 20 msec, in length, imply- 

FIG. 10 is a block diagram of an exemplary LPC 65 ing that the extracted parameters are transmitted in a 
analysis subsystem; burst 50 times per second. Furthermore the rate of data 

FIG. 11 is a block diagram of an exemplary LPC to transmission is varied from roughly 8 kbps to 4 kbps to 
LSP transformation subsystem; 2 kbps, and to 1 kbps. At full rate (also referred to as 
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rate 1), data transmission is at an 8.55 kbps rate with the to four times using the resultant LSP frequencies from 
parameters encoded for each frame using 171 bits in- the previous frame to approximate the results of LPC 
eluding an 11 bit internal CRC (Cyclic Redundancy analysis with the Hamming window centered on each 
Check). Absent the CRC bits the rate would be 8 kbps. subframe. The exception is that at full rate, the LPC 
At half rate (also referred to as rate J), data transmission 5 coefficients are not interpolated for the codebook sub- 
is at a 4 kbps rate with the parameters encoded for each frames. Further details on the LSP frequency computa- 
frame using 80 bits. At quarter rate (also referred to as tion is described later herein. 

rate J), data transmission is at a 2 kbps rate with the In addition to performing the pitch and codebook 

parameters encoded for each frame using 40 bits. At searches less often at lower rates, less bits are also allo- 

eighth rate (also referred to as rate J), data transmission 10 cated for the transmission of the LPC coefficients. The 

is slightly less than a 1 kbps rate with the parameters number of bits allocated at the various rates is shown in 

encoded for each frame using 16 bits. FIGS. 2a-2d. Each one of FIGS. 2a-2d represents the 

FIG. 1 graphically illustrates an exemplary analysis number of vocoder encoded data bits allocated to each 

frame of speech data 10 and the relationship of a Ham- 160 sample frame of speech. In FIGS. 2a-2d, the num- 

ming window 12 used in LPC analysis. LPC analysis 15 ber in the respective LPC block ZOa-SOd is the number 

frame, and pitch and codebook subframes for the differ- 0 f bits used at the corresponding rate to encode the 

ent rates are illustrated in graphical form in FIGS. short term LPC coefficients. In the preferred embodi- 

2a-2a\ It should be understood that the LPC analysis men t the number of bits used to encode the LPC coeffi- 

frame for all rates is the same size. c i ents at f ull> half ; quart er and eighth rates are respec- 

Referring now to the drawings, and in particular 20 tively 40, 20, 10 and 10. 

FIG. la, LPC analysis is accomplished using the 160 T n order t0 implement variable rate coding, the LPCs 

speech data samples of frame 10 which are windowed are first transformed into Line Spectrum Pairs (LSP) 

using Hamming window 12. As illustrated in FIG. la, and the resulting LSP frequencies are individually en- 

the samples, s(n) are numbered 0-159 within each coded using DPCM coders. The LPC order is 10, such 

frame. Hamming window 12 is positioned such that it is 25 that there are 10 LSP frequencies and 10 independent 

offset within frame 10 by 60 samples. Thus Hamming DPCM coders> ^ bit ^1^^ for thc DPC M coders 

window 12 starts at the 60™ sample, s(59), of the current j s according to Table I. 
data frame 10 and continues through and inclusive of 

the 59 th sample, s(58), of a following data frame 14. The TABLE I 

weighted data generated for a current frame, frame 10, 30 
therefore also contains data that is based on data from 
the next frame, frame 14. 

Depending upon the data transmission rate, searches 
are performed to compute the pitch filter and codebook 
excitation parameters multiple times on different sub- 35 
frames of data frame 10 as shown in FIGS, lb-le. It 

should be understood that in the preferred embodiment Both at the encoder and the decoder the LSP fre- 

that only one rate is selected for frame 10 such that the quencies are converted back to LPC filter coefficients 

pitch and codebook searches are done in various size before for use in the pitch and codebook searches, 

subframes corresponding to the selected rate as de- 40 With respect to the pitch search, at full rate as illus- 

scribed below. However for purposes of illustration, the trated in FIG. 2a, the pitch update is computed four 

subframe structure of the pitch and codebook searches times, once for each quarter of the speech frame. For 

for the various allowed rates of the preferred embodi- each pitch update at the full rate, 10 bits are used to 

ment for frame 10 are shown in FIGS, lb-le. encode the new pitch parameters. Pitch updates are 

At all rates, there is one LPC computation per frame 45 done a varying numbers of times for the other rates as 

10 as illustrated in FIG. la. As illustrated in FIG. lb, at shown in FIGS. 2b-2d. As the rate decreases the num- 

full rate there are two codebook subframes 18 for each ber of pitch updates also decreases. FIG. 2b illustrates 

pitch subframe 16. At full rate there are four pitch up- the pitch updates for half rate which are computed 

dates, one for each of the four pitch subframes 16, each twice, once for each half of the speech frame. Similarly 

40 samples long (5 msec). Furthermore at full rate there 50 FIG. 2c illustrates the pitch updates for quarter rate 

are eight codebook updates, one for each of the eight which is computed once every full speech frame. As 

codebook subframes 18, each 20 samples long (2.5 was for full rate, 10 bits are used to encode the new 

msec). pitch parameters for each half and quarter rate pitch 

At half rate, as illustrated in FIG. 1c, there are two update. However for eighth rate, as illustrated in FIG. 

codebook subframes 22 for each pitch subframe 20. 55 2d, no pitch update is computed since this rate is used to 

Pitch is updated twice, once for each of the two pitch encode frames when little or no speech is present and 

frames 20 while the codebook is updated four times, pitch redundancies do not exist, 

once for each of the four codebook subframe 22. At For each 10 bit pitch update, 7 bits represent the pitch 

quarter rate, as illustrated in FIG. ld t there are two lag and 3 bits represent the pitch gain. The pitch lag is 

codebook subframes 26 for the single pitch subframe 20. 60 limited to be between 17 and i43. The pitch gain is 

Pitch is updated once for pitch subframe 24 while the linearly quantized to between 0 and 2 for representation 

codebook twice, once for each of the two codebook by the 3 bit value. 

subframe 26. As illustrated in FIG. le, at eighth rate, With respect to the codebook search, at full rate as 

pitch is not determined and the codebook is updated illustrated in FIG. 2a, the codebook update is computed 

only once in frame 28 which corresponds to frame 10. 65 eight times, once for each eighth of the speech frame. 

Additionally, although the LPC coefficients are com- For each codebook update at the full rate, 10 bits are 

puted only once per frame, they are linearly interpo- used to encode the new codebook parameters. Code- 

Iated, in a Line Spectral Pair (LSP) representation, up book updates are done a varying number of times in the 







DPCM CODER NUMBER 






1 2 


3 4 5 6 7 8 


9 


10 


RATE 1 


4 4 


4 4 4 4 4 4 


4 


4 


RATE 4 


2 2 


2 2 2 2 2 2 


2 


2 


RATE \ 


1 1 


111111 


1 


1 


RATE I 


1 1 


111111 


1 


1 



02/26/2004, EAST Version: 1.4.1 



5,414,796 

9 10 

other rates as shown in FIGS. 2b-2d. However, as the Formant synthesis filter 60, a weighted filter as dis- 
rate decreases the number of codebook updates also cussed below, is characterized by the following equa- 
decreases. FIGS. 2b illustrates the codebook updates tion: 
for half rate which is computed four times, once for 
each quarter of the speech frame. FIG. 2c illustrates the 5 

codebook updates for quarter rate which is computed . = ( i ^ n „ . ^ i 
twice, once for each half of the speech frame. As was z ) ~ Az/p) 
for full rate, 10 bits are used to encode the new code- 
book parameters for each half and quarter rate pitch The in pu t speech samples s(n) are weighted by per- 
update. Finally, FIG. 2d illustrates the codebook up- 10 weighting filter 52 so that the weighted speech 
dates for eighth rate which is computed once every full samples x(n) are provided to a sum input of adder 62. 
speech frame. It should be noted that at eighth rate 6 are Perceptual weighting is utilized to weight the error at 
transmitted, 2 bits representative of the codebook gain fte frequencies where there is less signal power. It is at 
while the other 4 bits are random bits. Further discus- these low signal frequencies that the noise is 
sion on the bit allocations for the codebook updates are 15 mQre perceptua ii y noticeable. The synthesized speech 
described m further detail below. samples x'(n) are output from formant synthesis filter 60 
The bits allocated for the codebook updates represent to a differenc e input of adder 62 where subtracted from 
the data bits needed to vector quantize the pitch predic- the x(n) ^ ^ differe nce in samples output from 
tion residual. For full, half and quarter rates, each code- adder 62 m • % tQ mean square error ^ SE) element 
book update is comprised of 7 bits of codebook index 20 ^ ^ ^ ^ ^ smDmQd ^ fe _ 
plus 3 bits of codebook gain tor a total ot 0 bits. 1 tie ^ of MSE dement ^ 

are provided to minimization 

codebook gam k encoded usmg a differentia pulse code a ^ ^ wUch generates values for pitch lag L> pitch 

modulation (DPCM) coder operating in the log domam. ^ codebook index T Md codebook gain . 

Altoough a simUar bit arrangement can be used for fa element 66 all possible values for L, 

eighth rate, an alternate scheme .s preferred At eighth etcr fa p(z) afe . hch 

rate codebook gain is represented by 2 b.ts while 4 J * » from multiplier 

randomly generated bits are used with the receiveddata f > contribution 

as a seed to a pseudorandom number generator which ^ ^ values 0 f l and b 

replaces the codebook. . A . . . ' . , x . , . . 

Referring to the encoder block diagram illustrated in 30 that mmimize the weighted error between the input 

FIG. 3, the LPC analysis is done in an open-loop mode. s P eech the s ^ he i!f f speech are chosen by mmi- 
From each frame of input speech samples s(n) the LPC ""f * on element f *** synthesis filter 58 genera^ 
coefficients (a,-a,o) are computed, as described later, ™* ^tputs the value p(n) to formant syndesis filter 60^ 
by LPC analysis/quantization 50 for use in formant „ Once the pitch lag L and the pitch gain b for the pitch 
synthesis filter 60 ^ ter 816 *° und ' tne codebook search is performed in a 

The computation of the pitch search, however, is similar manner, 
done in a closed-loop mode, often referred to as an It should be understood that FIG. 3 is a conceptual 
analysis-by-synthesis method. However, in the exem- representation of the analysis-by-synthesis approach 
plary implementation novel hybrid closed-loop/open- taken in the present mvention. In the exemplary imple- 
loop technique is used in conducting the pitch search. In 40 mentation of the present invention, the filters are not 
the pitch search encoding is performed by selecting used in the typical closed loop feedback configuration, 
parameters which minimize the mean square error be- *n the present invention, the feedback connection is 
tween the input speech and the synthesized speech. For broken during the search and replaced with an open 
purposes of simplification in this portion of the discus- loop formant residual, the details of which are provided 
sion the issue of rate is not considered. However further later herein. 

discussion on the effect of the selected rate on pitch and Minimization element 66 then generates values for 
codebook searches is discussed in more detail later codebook index I and codebook gain G. The output 
herein. values from codebook 54, selected from a plurality of 

In the conceptual embodiment illustrated in FIG. 3, 5Q random gaussian vector values according to the code- 
perceptual weighting filter 52 is characterized by the book index I, are multiplied in multiplier 56 by the 
following equations: codebook gain G to produce the sequence of values c(n) 

used in pitch synthesis filter 58. The codebook index I 
si( 2 ) (l) and the codebook gain G that minimize the mean square 

w ^ 1=3 a(z/}l) * 55 error are chosen for transmission. 

where It should be noted that perceptual weighting W(z) is 

applied to both the input speech by perceptual 
10 _ . (2) weighting filter 52 and the synthesized speech by the 

A(z) = 1 - a,-z ' weighting function incorporated within formant syn- 

60 thesis filter 60. Formant synthesis filter 60 is therefore 
is the formant prediction filter and u. is a perceptual actually a weighted formant synthesis filter, which 
weighting parameter, which in the exemplary embodi- combines the weighting function of equation 1 with the 
ment ft =0.8. Pitch synthesis filter 58 is characterized by typical formant prediction filter characteristic 1/A(z) to 
the following equation: result in the weighted formant synthesis function of 

65 equation 3. 

i i (3) It should be understood that in the alternative, per- 

fW) i _ bz~ L ceptual weighting filter 52 may be placed between 

adder 62 and MSE element 64. In this case formant 
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synthesis filter 60 would have the normal filter charac- these quantized values being used in conducting the 
teristic of 1/A(z). codebook search. In this alternate implementation the 
FIG. 4 illustrates a flow chart of the steps involved in need for quantization of the selected G value after the 
encoding speech with the encoder of FIG. 3. For pur- codebook search, block 108, is eliminated, 
poses of explanation steps involving rate decision are 5 After the codebook search a decoder within the en- 
included in the flow chart of FIG. 4. The digitized coder is run on the optimal values of I, G, L and b. 
speech samples are obtained, block 80, from the sam- Running of the encoder's decoder reconstructs the en- 
pling circuitry from which the LPC coefficients are coder filter memories for use in future subframes. 
then calculated, block 82. As part of the LPC coeffici- A check is then made block 110, to determine 
ent calculation Hamming window and autocorrelation !0 whether the codebook subframe upon which analysis 
techniques are used. An initial rate decision is made, was just completed was the last codebook subframe of 
block 84, for the frame of interest based on frame en- the set of codebook subframes corresponding to the 
ergy in the preferred embodiment. pitch subframe for which the pitch search was con- 
In order to efficiently code the LPC coefficients in a ducted. In other words a determination is made as to 
small number of bits, the LPC coefficients are trans- 15 whether there are any more codebook subframes which 
formed into Line Spectrum Pair (LSP) frequencies, correspond to the pitch subframe. In the exemplary 
block 86, and then quantized, block 88, for transmission. embodiment there are only two codebook subframes 
As an option an additional rate determination may be . per pitch subframe. If it is determined that there is an- 
made, block 90, with an increase in the rate being made other codebook subframe which corresponds to the 
if the quantization of the LSPs for the initial rate is 20 pitch frame, steps 102-108 are repeated for that code- 
deemed insufficient, block 92. book subframe. 

For the first pitch subframe of the speech frame under Should there be no more codebook subframes corre- 
analysis the LSP frequencies are interpolated and trans- sponding to the pitch frame, a check is made, block 112, 
formed to LPC coefficients, block 94, for use in con- to determine whether any other pitch subframes exist 
ducting the pitch search. In the pitch search the code- 25 within the speech frame under analysis. If there is an- 
book excitation is set to zero. In the pitch search, blocks other pitch subframe in the current speech frame under 
96 and 98, which is an analysis by synthesis method as analysis, steps 94-110 are repeated for each pitch sub- 
previously discussed, for each possible pitch lag L the frame and corresponding codebook subframes. When 
synthesized speech is compared with the original all computations for the current speech frame under 
speech. For each value of L, an integer value, the opti- 30 analysis are completed, values representative of the 
mum pitch gain b is determined. Of the sets of L and b LPC coefficients for the speech frame, the pitch lag L 
values, the optimal L and b value set provide the mini- and gain b for each pitch subframe, and the codebook 
mum perceptually weighted mean square error between index I and gain G for each codebook subframe are 
the synthesized speech and the original speech. For the packed for transmission, block 114. 
determined optimum values of L and b for that pitch 35 Referring to FIG. 5, a decoder block diagram is illus- 
subframe, the value b is quantized, block 100, fox; trans- trated wherein the received values for the LPC coeffici- 
mission along with the corresponding L value. In an ents (a/s), pitch lags and gains (L & b), and codebook 
alternate implementation of the pitch search, the values indices and gains (I & G) are used to synthesize the 
b and L may be quantized values as part of the pitch speech. Again in FIG. 5, as is FIG. 3, rate information 
search with these quantized values being used in con- 40 is not considered for purposes in simplification of the 
ducting the pitch search. Therefore, in this implementa- discussion. Data rate information can be sent as side 
tion the need for quantization of the selected b value information and in some instances can be derived at the 
after the pitch search, block 100, is eliminated. channel demodulation stage. 

For the first codebook subframe of the speech frame The decoder is comprised of codebook 130 which is 

under analysis the LSP frequencies are interpolated and 45 provided with the received codebook indices, or for 

transformed to LPC coefficients, block 102, for use in eighth rate the random seed. The output from codebook 

conducting the codebook search. In the exemplary em- 130 is provided to one input of multiplier 132 while the 

bodiment however, at full rate the LSP frequencies are other input of multiplier 132 receives the codebook gain 

interpolated only down to the pitch subframe level. G. The output of multiplier 132 is provided along with 

This interpolation and transformation step is performed 50 the pitch lag L and gain b to pitch synthesis filter 134. 

for the codebook search in addition to that of the pitch The output from pitch synthesis filter 134 is provided 

search due to a difference in pitch and codebook sub- along with the LPC coefficients "at to formant synthesis 

frame sizes for each rate, except for rate J where the filter 136. The output from formant synthesis filter 136 

issue is moot since no pitch data is computed. In the is provided to adaptive postfilter 138 where filtered and 

codebook search, blocks 104 and 106, the optimum 55 output therefrom is the reconstructed speech. As dis- 

pitch lag L and pitch gain b values are used in the pitch cussed later herein, a version of the decoder is imple- 

synthesis filter such that for each possible codebook mented within the encoder. The encoder's decoder does 

index I the synthesized speech is compared with the not include adaptive postfilter 138, but does include a 

original speech. For each value of I, an integer value, * PtTTriptiinl wrightinrj flltm _ 
the optimum codebook gain G is determined. Of the sets 6(T~~ FIG. 6 is a flow chart corresponding to the operation 

of I and G values, the optimal I and G value set pro- I of the decoder of FIG. 5. At the decoder, speech is 

vides the minimum error between the synthesized I reconstructed from the received parameters, block 150. 

speech and the original speech. For the determined I In particular, the received value of the codebook index 

optimum values of I and G for that codebook subframe, 1 is input to the codebook which generates a codevector, 

the value G is quantized, block 108, for transmission 65! or codebook output value, block 152. The multiplier 

along with the corresponding I value. Again in an alter- I receives the codevector along with the received code- 
nate implementation of the codebook search, the value 1 book gain G and multiplies these values, block 154, with 

of G may quantized as part of the codebook search with Uhe resulting signal provided to the pitch synthesis filter. 
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It should be noted that the codebook gain G is recon- are interpreted as two's complement numbers having 14 
structed by decoding and inverse quantizing the re- fractional bits with the table being read in the order of 
ceived DPCM parameters. The pitch synthesis filter is left to right, top to bottom. 

TABLE II 

0 X 051f 0 X 0525 0 X 0536 0 X 0554 0 X 057d 0 X 05bl 0 X 05f2 0 X 063d 

0 X 0694 0 X 06f6 0 X 0764 0 X 07dc 0 X 085e 0 X 08ec 0 X 0983 0 X 0a24 

0 X OadO 0 X 0b84 0 X 0c42 0 X 0d09 0 X 0dd9 0 X OebO 0 X 0fV0 Ox 1077 

0 X 1166 0 x 125b 0 X 1357 0 X 1459 0 X 1560 0 X I66d 0 X 177f 0 X 1895 

0 X I9af 0 X lacd 0 X Ibee 0 X Id 1 1 0 X lc37 0 X lf5c 0 X 2087 0 X 21b0 

0 X 22da 0 X 2403 OX 252d 0 X 2655 O X 277b 0 X 28a0 0 X 29c2 O X 2ael 

0 X 2bfd O X 2dl5 0 X 2c29 O X 2f39 0 X 3043 0 X 3148 0 X 3247 0 X 333f 

0 X 3431 0 x 351c 0 X 3600 . 0 X 36db 0 X 37af 0 X 387a 0 X 393d 0 X 39f6 

0 X 3aa6 0 X 3b4c 0 X 3be9 Ox 3c7b 0 X 3d03 0 X 3d80 0 X 3df3 Ox 3e5b 

0 X 3eb7 0 X 3f09 0 X 3f4f 0 X 3(89 0 X 3fb8 0 X 3fdb 0 X 3ff3 0 X 3ffT 



provided with the received pitch lag L and gain b val- 
ues along with the multiplier output signal so as to filter 
the multiplier output, block 156. 

The values resulting from filtering the codebook 
vector by the pitch synthesis filter are input to the for- 
mant synthesis filter. Also provided to the formant 
synthesis filter are LPC coefficients a/s for use in filter- 
ing the pitch synthesis filter output signal, block 158. 
The LPC coefficients are reconstructed at the decoder 
for interpolation by decoding the received DPCM pa- 
rameters into quantized LSP frequencies, inverse quan- 
tizing the LSP frequencies and transforming the LSP 
frequencies to LPC coefficients a/s. The output from 
the formant synthesis filter is provided to the adaptive 
postfilter where quantization noise is masked, and the 
reconstructed speech is gain controlled, block 160. The 
reconstructed speech is output, block 162, for conver- 
sion to analog form. 

Referring now to the block diagram illustration of 
FIGS, la and 7b, further details on the speech encoding 
techniques of the present invention are described. In 
FIG. 7a, each frame of digitized speech samples is pro- 
vided to a Hamming window subsystem 200 where the 
input speech is windowed before computation of the 
autocorrelation coefficients in autocorrelation subsys- 
tem 202. 

Hamming window subsystem 200 and autocorrela- 
tion subsystem 202. are illustrated in an exemplary im- 
plementation in FIG. 8. Hamming window subsystem 
200 which is comprised of lookup table 250, typically an 
a 80X 16 bit Read Only Memory (ROM), and multiplier 
252. For each rate the window of speech is centered 
between the 139th and the 140th sample of each analysis 
frame which is 160 samples long. The window for com- 
puting the autocorrelation coefficients is thus offset 
from the analysis frame by 60 samples. 

Windowing is done using a ROM table containing 80 
of the 160 W#(n) values, since the Hamming window is 
symmetric around the center. The offset of the Ham- 
ming window is accomplished by skewing the address 
pointer of the ROM by 60 positions with respect to the 
first sample of an analysis frame. These values are multi- 
plied in single precision with the corresponding input 
speech samples by multiplier 252. Let s(n) be the input 
speech signal in the analysis window. The windowed 
speech signal Sw(n) is thus defined by: 

sM^n+MWfAn) for 0<=/i<=79 (5) 

and 

Jw C/i)=j(n+60)»Oy(159-n)for 80<=n< = 159. (6) 65 

Exemplary values, in hexadecimal, of the contents of 
lookup table 250 are set forth in Table II. These values 



Autocorrelation subsystem 202 is comprised of regis- 
ter 254, multiplexer 256, shift register 258, multiplier 
260, adder 262, circular shift register 264 and buffer 266. 
The windowed speech samples s*(n) are computed 
every 20 msec, and latched into register 254. On sample 
sjff), the first sample of an LPC analysis frame, shift 
registers 258 and 264 are reset to 0. On each new sample 
Sn(n), multiplexer 256 receives a new sample select 
signal which allows the sample to enter from register 
254. The new sample Sn<n) is also provided to multiplier 
260 where multiplied by the sample s»(n— 10), which is 
in the last position SR10 of shift register 258. The resul- 
tant value is added in adder 262 with the value in the 
last position CSR11 of circular shift register 264. 

Shift registers 258 and 260 clocked once, replacing 
s^n— 1) by Sw(n) in the first position SRI of shift regis- 
ter 258 and replacing the value previously in position 
CSR10. Upon clocking of shift register 258 the new 
sample select signal is removed from input to multi- 
plexer 256 such that the sample s*(n— 9) currently in the 
position SR10 of shift register 260 is allowed to enter 
multiplexer 256. In circular shift register 264 the value 
previously in position CSR11 is shifted into the first 
position CSR1. With the new sample select signal re- 
moved from multiplexer, shift register 258 is set to pro- 
vide a circular shift of the data in the shift register like 
that of circular shift register 264. 

Shift registers 258 and 264 are both clocked 11 times 
in all for every sample such that 1 1 multiply/accumu- 
late operations are performed. After 160 samples have 
been clocked in, the autocorrelation results, which are 
contained in circular shift register 264, are clocked into 
buffer 266 as the values R(0)-R(10). All shift registers 
are reset to zero, and the process repeats for the next 
frame of windowed speech samples. 

Referring back to FIG. 7a, once the autocorrelation 
coefficients have been computed for the speech frame, a 
rate determination subsystem 204 and an LPC analysis 
subsystem 206 use this data to respectively compute a 
frame data rate and LPC coefficients. Since these opera- - 
tions are independent from one another they may be 
computed in any order or even simultaneously. For 
purposes of explanation herein, the rate determination is 
described first. 

Rate determination subsystem 204 has two functions: 
(1) to determine the rate of the current frame, and (2) to 
compute a new estimate of the background noise level. 
The rate for the current analysis frame is initially deter- 
mined based on the current frame's energy, the previous 
estimate of the background noise level, the previous 
rate, and the rate command from a controlling micro- 
processor. The new background noise level is estimated 
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using the previous estimate of the background noise to adder 272. Circular shift registers 274, 276 and 278 

level and the current frame energy. are respectively loaded with the first, second and third 

The present invention utilizes an adaptive threshold- coefficients of the quadratic threshold equations (7)-(9). 

ing technique for rate determination. As the back- For example, the last, middle and first positions of cir- 

ground noise changes so do the thresholds which are 5 cular shift register 274 are respectively loaded with the 

used in selecting the rate. In the exemplary embodi- first coefficient of the equations from which Tl, T2 and 

ment, three thresholds are computed to determine a T3 are computed. Similarly, the last, middle and first 

preliminary rate selection RTp. The thresholds are qua- positions of circular shift register 276 are respectively 

dratic functions of the previous background noise esti- loaded with the second coefficient of the equations from 

mate, and are shown below: \q which X1> ^ and T3 &re computed Finally, the last, 

T*,r» -w, , middle and first positions of circular shift register 278 

71 (5) =» -5.5446 3( 0 ~ 6 )fl 2 + 4.047 525+ 363. 293; (7) . . f , , . , . * + + r x , 

w * w are respectively loaded with the constant term of the 

T2(5)=-i.529733(i0-5)52 + 8.750O455+H36.2i4; (8) equations from which Tl, T2 and T3 are computed. In 

each of circular shift registers 274, 276 and 278, the 

ant j 15 value is output from the last position. 

In computing the first threshold Tl the previous 

73(B) = -3.957050(10 ~ 5 )B 2 + 18.899625+3346.789 (9) frame background noise estimate B is squared by multi- 
plying the value by itself in multiplier 280. The resultant 

where B is the previous background noise estimate. B 2 value is multiplied by the first coefficient, 

The frame energy is compared to the three thresholds 20 5.5446 13(10 -6 ), which is output from the last position 

T1(B), T2(B) and T3(B). If the frame energy is below all of circular shift register 274. This resultant value is 

three thresholds, the lowest rate of transmission (1 added in adder 286 with the product of the background 

kbps), rate J where RT P — 4, is selected. If the frame noise B and the second coefficient, 4.047152, output 

energy is below two thresholds, the second rate of from the last position of circular shift register 276, from 

transmission (2 kbps), rate i where RT^=3, is selected. 2 5 multiplier 284. The output value from adder 286 is then 

If the frame energy is below only one threshold, the added in adder 288 with the constant term, 363.1293, 

third rate of transmission (4 kbps), rate \ where RT p =2, output from the last position of circular shift register 

is selected. If the frame energy is above all of the thresh- 278. The output from adder 288 is the computed value 

olds, the highest rate of transmission (8 kbps), rate 1 of Tl. 

where IO>= 1, is selected. 30 computed value of Tl output from adder 290 is 

The preliminary rate RT^may then be modified based subtracted in adder 288 from the frame energy value E/ 

on the previous frame final rate RT r . If the preliminary which in the exemplary embodiment is the value R(0) in 

rate RTpis less than the previous frame final rate minus the linear domain, provided from the autocorrelation 

one (RT r — 0, an intermediate rate RT m is set where subsystem. 

RT„=(RT r -l). This modification process causes the 35 In m a]ten , at i ve implementation, frame energy E/ 
rate to slowly ramp down when a transition from a high may ^ 5e repr esented in the log domain in dB where 
energy signal to a low energy signal occurs. However it is approx i ma ted by the log of the first autocorrelation 
should the initial rate selection be equal to or greater coefficient R( 0 ) normalized by the effective window 
than the previous rate minus one (RT r — 1), the interme- length- 
diate rate RT m is set to the same as the preliminary rate 40 
RTp, i.e. RT m = RTp. In this situation die rate thus im- 
mediately increases when a transition from a low en- E/~ loiogio ffi 2 (I0 ^ 
ergy signal to a high energy signal occurs. 

Finally, the intermediate rate RT m is further modified , , x . . , . . , , . T 
by rate bound commands from a microprocessor. If the 45 \ A * * e ^autocorrelation .window length It 
rate RT m is greater than the highest rate allowed by the s * oM also be understood that voice achvrty may also 
microprocessor, the initial rate RT;is set to the highest be , measured from various other parameters mcluding 
allowable value. Similarly, if the intermediate rate RT m P ltch P red «*° n S™ or formant pred.ct.on gam G a : 
is less than the lowest rate allowed by the microproces- 
sor, the initial rate RTy is set to the lowest allowable 50 r ... £< 10 > 0*) 
value. Ga = I01ogl0 l(5r 

In certain cases it may be desirable to code all speech 
at a rate determined by the microprocessor. The rate where E( 10 ) is the prediction residual energy after the 
bound commands can be used to set the frame rate at the 10th iteration and E(°) is the initial LPC prediction re- 
desired rate by setting the maximum and minimum al- 55 sidual energy, as described later with respect to LPC 
lowable rates to the desired rate. The rate bound com- analysis, which is the same as R(0). 
mands can be used for special rate control situations From the output of adder 290, the complement of the 
such as rate interlock, and dim and burst transmission, sign bit of the resulting two's complement difference is 
both described later. In an alternate embodiment the extracted by comparator or limiter 292 and provided to 
LPC coefficients can be calculated prior to the rate 60 adder 272 where added with the output of register 270. 
determination. Since the calculated LPC coefficients Thus, if the difference between R(0) and Tl is positive, 
reflect the spectral properties of the input speech frame, register 270 is incremented by one. If the difference is 
the coefficients can be used as an indication of speech negative, register 270 remains the same, 
activity. Thus, the rate determination can be done based Circular registers 274, 276 and 278 are then cycled so 
upon the calculated LPC coefficients, 65 the coefficients of the equation for T2, equation (8) 

FIG. 9 provides an exemplary implementation of the appear at the output thereof. The process of computing 

rate decision algorithm. To start the computation, regis- the threshold value T2 and comparing it with the frame 

ter 270 is preloaded with the value 1 which is provided energy is repeated as was discussed with respect to the 
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process for threshold value Tl. Circular registers 274, input value of 1 for addition with the value B so as to 

276 and 278 are then again cycled so the coefficients of generate the term B+ 1. Multiplier 304 is also provided 

the equation for T3, equation (9) appear at the output with an input value of K for multiplication with the 

thereof. The computation for threshold value T3 and value B so as to generate the term KB. The terms B+ 1 
comparison to the frame energy as was described 5 and KB are output respectively from adder 304 and 

above. After completion of all three threshold computa- multiplier 306 to separate inputs of both multiplexer 308 

tions and comparisons, register 270 contains the initial and adder 310. 

rate estimate RTy. The preliminary rate estimate RT P is Adder 310 and comparator or limiter 312 are used in 

provided to rate ramp down logic 294. Also provided to selecting the larger of the terms B+ 1 and KB. Adder 
logic 294 is the previous frame final rate RT r from LSP io 310 subtracts the term B+l from KB and provides the 

frequency quantization subsystem that is stored in regis- resulting value to comparator or limiter 312. Limiter 

ter 298. Logic 296 computes the value (RT f - 1) and m provides a control signal to multiplexer 308 so as to 

provides as an output the larger of theprelimmary rate sdect an Qut t thereof M ^ x of ^ terms B+ x 

estimate RT„ and the value RT,- 1). The value RT m is ^ ^ ^ sdected term B+ ! or KB is output from 

provided to rate limiter logic 296. 15 multiplexer 308 to limiter 314 which is a saturation type 

As mentioned previously, the microprocessor pro- whi(± jdes either ^ seIected tenn if Mow 

vides rate bound commands to the vocoder, particularly ^ Qr ^ ^ ^ 

to logic 296. In a digital signal processor ^^Plementa- from sec . 

tion, this command is received in logic 296 before the . . , / u . . . r , , 

t oA 1 • -* r*u a- ~ v. ~ ond input to multiplexer 300 and as an input to adder 

LPC analysis portion of the encoding process is com- 20 

pleted. Logic 296 ensures that the rate does not exceed * * . 1. * 

the rate bounds and modifies the value RT m should it Adder f 6 ^ S0 A r ff ^5?/ mput 

exceed the bounds. Should the value RT m be within the «f & vdu f . E / , Adder 3 * 6 and comparator or limiter 

range of allowable rates it is output from logic 296 as ™ ™ ™«J in selecting the smaller of the value E/and 

the initial rate value KT, The initial rate value RT,is 25 the term output from lun ' ter 31 f* Ad ^ er 316 subt ' acts 

output from logic 296 to LSP quantization subsystem the ^ ener Sy value from value out P ut from 

210 of FIG 7a. limiter 314 and provides the resulting value to compara- 

The background noise estimate as mentioned previ- tor or limiter 318. Limiter 318 provides a control signal 

ously is used in computing the adaptive rate thresholds. to multiplexer 300 for selecting the smaller of the E/ 

For the current frame the previous frame background 30 value and the output from limiter 314. The selected 

noise estimate B is used in establishing the rate thresh- value output from multiplexer 300 is provided as the 

olds for the current frame. However for each frame the new background noise estimate B' to register 302 where 

background noise estimate is updated for use in deter- stored for use during the next frame as the previous 

mining the rate thresholds for the next frame. The new frame background noise estimate B. 
background noise estimate B' is determined in the cur- 35 Referring back to FIG. 7, each of the autocorrelation 

rent frame based on the previous frame background coefficients R(0)-R(10) are output from autocorrelation 

noise estimate B and the current frame energy E/ subsystem 202 to LPC analysis subsystem 206. The LPC 

In detennming the new background noise estimate B' coefficients computed in LPC analysis subsystem 206 in 

for use during the next frame (as the previous frame both the perceptual weighting filter 52 and formant 

background noise estimate B) two values are computed. ^ synthesis filter 60. 

The first value V\ is simply the current frame energy E/ The LPC coefficients may be obtained by the auto- 
The second value V2 is the larger of B + 1 and KB, correlation method using Durbin's recursion as dis- 
where K= 1.00547. To prevent the second value from cussed in Digital Processing of Speech Signals, Rabiner & 
growing too large, it is forced to be below a large con- Schafer, Prentice-Hall, Inc., 1978. This technique is an 
stant M= 160,000. The smaller of the two values Vj or 45 efficient computational method for obtaining the LPC 
V2 is chosen as the new background noise estimate B'. coefficients. The algorithm can be stated in the follow- 
Mathematically, ing equations: 



Vi=R(0) (12) 
V 2 =min (160000. max (KB. B+ 1)) (13) 



50 



£<°> = R(Q), i = 1; (15) 

(16) 



and the new background noise estimate B' is: ( 1 ( ,_ I} \ /F v- 0- 

B'=min(Vi.V2) (14) *,-^W-^a ; R(, -J) J/& 

where min (x,y) is the minimum of x and y, and max a}** = k,- (17) 

(x,y) is the maximum of x and y. 55 „ (/ . 

FIG. 9 further shows an exemplary implementation a ) = a ) ~ W-y far l<«y <« / - i : (18) 

of the background noise estimation algorithm. The first ^ = ^ _ k 2^ and (19) 
value Vi is simply the current frame energy E/provided 

directly to one input of multiplexer 300. if ' < io then goto equation (16) with i i = i + I. (20) 

The second value V2 is computed from the values KB 60 

and B+l, which are first computed. In computing the The ten LPC coefficients are labeled a/ 10 ), for 1<=- 

values KB and B+l, the previous frame background j< = 10 

noise estimate B stored in register 302 is output to adder Prior to encoding of the LPC coefficients, the stabil- 
304 and multiplier 306. It should be noted that the previ- ity of the filter must be ensured. Stability of the filter is 
ous frame background noise estimate B stored in regis- 65 achieved by radially scaling the poles of the filter in- 
ter 302 for use in the current frame is the same as the ward by a slight amount which decreases the magnitude 
new background noise estimate B' computed in the of the peak frequency responses while expanding the 
previous frame. Adder 304 is also provided with an bandwidth of the peaks. This technique is commonly 
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known as bandwidth expansion, and is further described multiplier 370 with the value E(0 from register 348. The 
in the article "Spectral Smoothing in PARCOR Speech resulting value E<0 is input to register 348 via multi- 
Analysis-Synthesis" by Tohkura et.al., ASSP Transac- plexer 350 for storage as the value EO'-D for the next 
tions, December 1978. In the present case bandwidth cycle. 

expansion can be efficiently done by scaling each LPC 5 The value k/is then used to calculate the value a,<0 as 

coefficient. Therefore, as set forth in Table III, the m equation (15). In this case the value k/ is input to 

resultant LPC coefficients are each multiplied by a buffer 356 via multiplexer 358. The value k/ is also used 

corresponding hex value to yield the final output LPC in buffer up d a te circuit 334 to calculate the values a/0 

coefficients ai-aio of LPC analysis subsystem 206. It from the values a </-i) as in equation (18). The values 

should be noted that the values presented in Table III io curre ntly stored in buffer 352 are used in computing the 

are given m hexadecimal with 15 fractional bits m two's values <,). As indicated in equa tion (18), there are i- 1 

complement natation. In this form the value 0x8000 ^Ixlons in the i* cycle. In the U= 1 iteration no such 

%sz^%^ oxm * (or 29491) repre - c r atio T m rr d For n r e of J : or the ir ; 

cycle a value of a/'J is computed. In computing each 

TABLE III 15 value of a/'), each value of a/'~ is multiplied in multi- 

a\ = ai< l0 >« o x 7333 plier 372 with the value k, for output to adder 374. In 

o-2 = a2<[°}« 0 x 67ae adder 374 the value k/a/./'- 1 ) is subtracted from the 

Z = So©)! Sx 53ft value a /'~ ° also input to adder 374, ^ result of each 

as = as* 10 )* 0 x 4b95 multiplication and addition is provided as the value of 

a b = a 6 ( 10 > • 0 x 4406 20 a/0 to buffer 356 via multiplexer 358. 

ai Z a7 (lo}* 0 X 3d38 ° nce the values a / <0 and a /° are computed for the 

al = a9 (l0 U 0 x 3196 current cycle, the values just computed and stored in 

oio = aio (10) • 0 x 2cal buffer 356 are output to buffer 352 via multiplexer 354. 

— — — — — — — — _— ^ va j ues store( j in buffer 356 are stored in corre- 

The operations are preferrably performed in double f™?™* P 0sitio " S in *f« 352 h ^ 35 ? * th f U P' 
precision, i.e. 32 bit divides, multiplies and additions. da ed for com Pf"S the v ^ k < for 1 c * c e \ , 

Double precision accuracy is preferred in order to u h ^ important to note that data , o/'-D generated at 
maintain the dynamic range of the autocorrelation func- the 1 end of a Prions cycle is used during the current 
tions and filter coefficients. 30 c y cle t0 S enerate updates a/0 for a next cycle. This 

In FIG. 10, a block diagram of an exemplary embodi- P[ evi ° us c y cle data must be retained in order to com- 
ment of the LPC subsystem 206 is shown which imple- P leteI y £ ener f^ updated data for the next cycle. Thus 
ments equations (15M20) above. LPC subsystem 206 is two buffers 356 ™ d 352 « utlllzed t0 P reserve tms 
comprised of three circuit portions, a main computation previous cycle data until the updated data is completely 
circuit 330 and two buffer update circuits 332 and 334 35 generated. ... 
which are used to update the registers of the main com- ^ above description is written with respect to a 
putation circuit 330. Computation is begun by first load- parallel transfer of data from buffer 356 to buffer 352 
ing the values R(l)-R( 10) into buffer 340. To start the u P on completion of the calculation of the updated val- 
calculation, register 348 is preloaded with the value ues - ™ s implementation ensures that the old data is 
R(l) via multiplexer 344. Register is initialized with 40 retained during the entire process of computing the new 
R(0) via multiplexer 350, buffer 352 (which holds 10 data » without loss of the old data before completely 
aj<'- i) values) is initialized to all zeroes via multiplexer used as would occur in a single buffer arrangement. The 
354, buffer 356 (which holds 10 aj(0 values) is initialized described implementation is one of several implementa- 
to all zeroes via multiplexer 358, and i is set to 1 for the tion s that are readily available for achieving the same 
computational cycle. For purposes of clarity counters 45 result. For example, buffers 352 and 356 may be multi- 
fori and j and other computational cycle control are not plexed such that upon calculating the value k/ for a 
shown but the design and integration of this type of current cycle from values stored in a first buffer, the 
logic circuitry is well within the ability of one skilled in updates are stored in the second buffer for use during 
the art in digital logic design. the next computational cycle. In this next cycle the 

The aj('- ] ) value is output from buffer 356 to com- 50 value k/is computed from the values stored in the sec- 
pute the term k/E^'-Oas set forth in equation (14). Each ond buffer. The values in the second buffer and the 
value R(i-j) is output from buffer 340 for multiplication value k/are used to generate updates for the next cycle 
with the aj('- value in multiplier 360. Each resultant with these updates stored in the first buffer. This alter- 
value is subtracted in adder 362 from the value in regis- nating of buffers enables the retention of proceeding 
ter 346. The result of each subtraction is stored in regis- 55 computational cycle values, from which updates are 
ter 346 from which the next term is subtracted. There generated, while storing update values without over- 
are i— 1 multiplications and accumulations in the i /A writing the proceeding values which are needed to 
cycle, as indicated in the summation term of equation generate the updates. Usage of this technique can mini- 
(14). At the end of this cycle, the value in register 346 is mize the delay associated with the computation of the 
divided in divider 364 by the value E(''— 0 from register 60 value k/for the next cycle. Therefore the updates for the 
348 to yield the value k,*. multiplications/accumulations in computing k/ may be 

The value k/is then used in buffer update circuit 332 done at the same time as the next value of a/' -1 ) is 
to calculate the value E(0 as in equation (19) above, computed. 

which is used as the value E('~ 0 during the next compu- The ten LPC coefficients a/ 10 ), stored in buffer 356 
tational cycle of k'. The current cycle value k/is multi- 65 upon completion of the last computational cycle 
plied by itself in multiplier 366 to obtain the value k/ 2 . (i= 10), are scaled to arrive at the corresponding final 
The value k/ 2 is then subtracted from the value of 1 in LPC coefficients a;. Scaling is accomplished by provid- 
adder 368. The result of this addition is multiplied in ing a scale select signal to multiplexers 344, 376 and 378 
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so that the scaling values stored in lookup table 342, hex lookup table 400, an output of cos <d, cos 2a>, cos 3ot>, cos 
values of Table III, are selected for output through 4co and cos 5o> are provided where: 
multiplexer 344. The values stored in lookup table 342 

are clocked out in sequence and input to multiplier 360. m=pr/i% (23) 

Multiplier 360 also receives via multiplexer 376 the 5 

a/ I0 > values sequentially output from register 356. The where j is a count value * 

scaled values are output from multiplier 360 via multi- Th « values cos o>, cos 2o>, cos 3co and cos 4o> output 
plexer 378 as an output to LPC to LSP transformation from lookup table 400 are input to a respective multi- 
subsystem 208 (FIG. 7). P lier ^» ^» ^ 410 » wni te the value cos 5a> is 
In order to efficiently encode each of the ten scaled 10 input directly to summer 412. These values are multi- 
LPC coefficients in a small number of bits, the coeffici- P^d * respective multiplier 404, 406, 408, and 410 
ents are transformed into Line Spectrum Pair frequen- with a respective one of the values P4t p 3 , P2and pi input 
cies as described in the article "Line Spectrum Pair thereto via multiplexers 414, 416, 418 and 420. The 
(LSP) and Speech Data Compression", by Soong and resultant values from this multiplication are also input 
Juang, ICASSP. '84. The computation of the LSP pa- 15 to summer 412. Furthermore the value p 5 is provided 
rameters is shown below in equations (21) and (22) through multiplexer 422 to multiplier 424 with the con- 
along with Table IV. stant value 05 ' i,e - ^ so provided to multiplier 424. 

The LSP frequencies are the ten roots which exist The resultant value output from multiplier 424 is pro- 
between 0 and ir of the following equations: vi ^ed as another input to summer 412. Multiplexers 

20 414-422 select between the values pt-ps or qi-qs in 
p(«)=cos 5«+pi cos 4o+. . . +P4 cos u+ps/2; (21) response to a p/q coefficient select signal, so as to use 

the same circuitry for computation of both the P(a>) and 
e(a»)=cos5a)+9icos4w+... +94cosw+?s/2;and (22) Q( w ) values. The circuitry for generating the pi-ps or 

qi-qs values is not shown but is readily implemented 
where the p„ and q„ values for n=l, 2, 3, 4 and are 25 using a series of adders for adding and subtracting the 
defined recursively in Table IV. LPC coefficients and pi-ps or qi-qs values, along with 

TABLE IV registers for storing the pi-ps or qi-qs values. 

Summer 412 sums the input values to provide the 
output P(o>) or Q(o) value as the case may be. For 
30 purposes of ease in further discussion the case of the 
values of P(oo) will be considered with the values of 
Q(o>) computed in a similar fashion using the qi-qs 
values. The current value of P(o>) is output from sum- 
In Table IV, the ai, . . . , aio values are the scaled mer 412 where stored in register 426. The preceding 
coefficients resulting from the LPC analysis. The ten 35 value of P(a>), previously stored in register 426 is shifted 
roots of equations (21) and (22) are scaled to between 0 to register 428. The sign bits of the current and previous 
and 0.5 for simplicity. A property of the LSP frequen- values of P(o>) are exclusive OR'ed in exclusive OR gate 
cies is that, if the LPC filter is stable, the roots of the 430 to give an indication of a zero crossing or sign 
two functions alternate; i.e. the lowest root, on, is the change, in the form of an enable signal that is sent to 
lowest root of P(o>), the next lowest root, 0)2, is the 40 linear interpolator 434. The current and previous value 
lowest root of Q(g>), and so on. Of the ten frequencies, of P(a>) are also output from registers 426 and 428 to 
the odd frequencies are the roots of the P(a>), and the linear interpolator 434 which is responsive to the enable 
even frequencies are the roots of the Q(a>). signal for interpolating the point between the two val- 

The root search is done as follows. First, the p and q ues of P(a>) at which the zero crossing occurs. This 
coefficients are computed in double precision by adding 45 linear interpolation fractional value result, the distance 
the LPC coefficients as shown above. P(a>) is then eval- from the value j - 1, is provided to buffer 436 along 
uated every 7r/256 radians and these values are then with the value j from counter 256. Gate 430 also pro- 
evaluated for sign changes, which identify a root in that vides the enable signal to buffer 436 which permits the 
subregion. If a root is found, a linear interpolation be- storage of the value j and the corresponding fractional 
tween the two bounds of this region is then done to 50 value FVy. 

approximate the location of the root. One Q root is The fractional value is subtracted from the value j as 
guaranteed to exist between each pair of P roots (the output from buffer 436 in adder 438, or in the alternative 
fifth Q root exists between the fifth P root and tr) due to may be subtracted therefrom as input to buffer 436. In 
the ordering property of the frequencies. A binary the alternative a register in the j line input to buffer 436 
search is done between each pair of P roots to deter- 55 may be used such that the value j — 1 is input to buffer 
mine the location of the Q roots. For ease in implemen- 436 with the fractional value input also input thereto, 
tation, each P root is approximated by the closest 7r/256 The fractional value may be added to the value j— 1 
value and the binary search is done between these ap- either before storage in register 436 or upon output 
proximations. If a root is not found, the previous unq- thereof. In any case the combined value of j+FV y - or 
uantized values of the LSP frequencies from the last 60 (j— 1)+ FVy is output to divider 440 where divided by 
frame in which the roots were found are used. the input constant value of 512. The division operation 

Referring now to FIG. 11, an exemplary implementa- may be simply be performed by merely changing the 
tion of the circuitry used to generate the LSP frequen- binary point location in the representative binary word, 
cies is illustrated. The above described operation re- This division operation provides the necessary scaling 
quires a total of 257 possible cosine values between 0 65 to arrive at a LSP frequency between 0 and 0.5. 
and 77% which are stored in double precision in a lookup Each function evaluation of P(<u) or Q(a>) requires S 
table, cosine lookup table 400 which is addressed by cosine lookups, 4 double precision multiplications, and 4 
mod 256 counter 402. For each value of j input to additions. The computed roots are typically only accu- 
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rate to about 13 bits, and are stored in single precision. 
The LSP frequencies are provided to LSP quantization 
subsystem 210 (FIG. 7) for quantization. 

Once the LSP frequencies have been computed, they 
must be quantized for transmission. Each of the ten LSP 5 
frequencies centers roughly around a bias value. It 
should be noted that the LSP frequencies approximate 
the bias values when the input speech has flat spectral 
characteristics and no short term prediction can be 
done. The biases are subtracted out at the encoder, and 10 
a simple DPCM quantizer is used. At the decoder, the 
bias is added back. The negative of the bias value, in 
hexadecimal, for each LSP frequency, o>i-a>io, as pro- 
vided from the LPC to LSP transformation subsystem 
is set forth in Table V. Again the values given in Table 15 
V are in two's complement with 15 fractional bits. The 
hex value 0X8000 (or -32768) represents -1.0. Thus 
the first value in Table V, the value 0xfa2f (or - 1489) 
represents -0.045441 = - 1489/32768. 

20 

TABLE V 





Negative 


LSP 


Bias 


frequency 


Value 


a>i 


0 X fa2f 




0 X f45e 


^3 


0 X ee8c 


04 


0 X e8bb 


t»5 


0 X e2e9 


a>6 


0 X dd!8 


c>7 


0 X d746 


OS 


0 X dl75 


ci>9 


0 X cba3 


a>!0 


0 X c5d2 



RATE 



Full 



Half 



Quarter Eighth 



CO] 

W2 
«3 
0)4 
o>5 
&)6 

6)7 

cog 
<l)9 
W10 
Total 



4: ± .025 



± .04 
± .07 
± .07 
± .06 
± .06 
± .05 
± .05 
± .04 
± .04 



40 bits 



2: ± .015 
2: ± .015 
2: ± .03 
2: ± .03 
2: ± .03 
2: ± .02 
2: ± .02 
2: ± .02 
2: ± .02 
2: ± .02 
20 bits 



1: ± .01 
1: ± .01 
I: ± .01 
1: ± .01 
1: ± .01 
1: ± .01 
1: ± .01 
1: ± .01 
1: ± .01 
1: ± .01 
10 bits 



1: ± .01 

1: ± .015 

1: ± .015 

I: ± .015 

1: ± .015 

1: ± .015 

1: ± .01 

1: ± .01 

I: ± .01 

1: ± .01 
10 bits 
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The predictor used in the subsystem is 0.9 times the 
quantized LSP frequency from the previous frame ^ 5 
stored in a buffer in the subsystem. This decay constant 
of 0.9 is inserted so that channel errors will eventually 
die off. 

The quantizers used are linear, but vary in dynamic 
range and step size with the rate. Also, in high rate 
frames more bits are transmitted for each LSP fre- 
quency, therefore the number of quantization levels 
depends upon the rate. In Table VI, the bit allocation 
and the dynamic range of the quantization are shown 
for each frequency at each rate. For example, at rate 1, 
wi is uniformly quantized using 4 bits (that is, into 16 
levels) with the highest quantization level being 0.025 
and the lowest being —0.025. 

TABLE VI 



40 



45 



50 
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If the quantization ranges for the rate chosen by the 
rate decision algorithm are not large enough or a slope 
overflow occurs, the rate is bumped up to the next 65 
higher rate. The rate continues to be bumped up until 
the dynamic range is accommodated or full rate is 
reached. In FIG. 12 an exemplary block diagram illus- 



tration of one implementation of the optional rate bump 
up technique is provided. 

FIG. 12 illustrates in block diagram form an exem- 
plary implementation of the LSP quantization subsys- 
tem 210 which includes the rate bump up circuitry. In 
FIG. 12, the current frame LSP frequencies are output 
from divider 440 (FIG. 11) to register 442 where they 
are stored for output during a rate bump up determina- 
tion in the next frame. The previous frame LSP frequen- 
cies and the current frame LSP frequencies are output 
respectfully output from register 440 and divider 440 to 
rate bump up logic 442 for a current frame rate bump up 
determination. Rate bump up logic 442 also receives the 
initial rate decision, along with the rate the rate bound 
commands from rate determination subsystem 204. In 
determining whether a rate increase is necessary, logic 
442 compares the previous frame LSP frequencies with 
the current frame LSP frequencies based on the sum of 
the square of the difference between the current and 
previous frame LSP frequencies. The resulting value is 
then compared with a threshold value for which if 
exceeded is an indication that an increase in rate is nec- 
essary to ensure high quality encoding of the speech. 
Upon exceeding the threshold value, logic 442 incre- 
ments the initial rate by one rate level so as to provide 
an output of the final rate used throughout the encoder. 

In FIG. 12, each LSP frequency value o>i-cr>iois input 
one at a time to adder 450 along with the corresponding 
bias value. The bias value is subtracted from the input 
LSP value and the result thereof output to adder 452. 
Adder 452 also receives as an input a predictor value, a 
previous frame corresponding LSP value multiplied by 
a decay constant. The predictor value is subtracted 
from the output of adder 450 by adder 452. The output 
of adder 452 is provided as an input to quantizer 454. 

Quantizer 454 is comprised of limiter 456, minimum 
dynamic range lookup table 458, inverse step size 
lookup table 460, adder 462, multiplier 464 and bit mask 
466. Quantization is performed in quantizer 454 by first 
determining whether the input value is within the dy- 
namic range of quantizer 454. The input value is pro- 
vided to limiter 456 which limits the input value to the 
upper and lower bounds of the dynamic range if the 
input exceeds the bounds provided by lookup table 458. 
Lookup table 458 provides the stored bounds, accord- 
ing to Table VI, to limiter 456 in response to the rate 
input and the LSP frequency index i input thereto. The 
value output from limiter 456 is input to adder 462 
where the minimum of the dynamic range, provided by 
lookup table 458 is subtracted therefrom. The value 
output from lookup table 458 is again determined by the 
rate and LSP frequency index i in accordance with the 
minimum dynamic range values, disregarding the value 
sign, set forth in Table VI. For example the value in 
lookup table 458 for (full rate, a>\) is 0.025. 

The output from adder 462 is then multiplied in multi- 
plier 464 by a value selected from lookup table 460. 
Lookup table 460 contains values corresponding to the 
inverse of the step size for each LSP value at each rate 
in accordance with the values set forth in Table VI. The 
value output from lookup table 460 is selected by the 
rate and LSP frequency index i. For each rate and LSP 
frequency index i the value stored in lookup table 460 is 
the quantity ((2 n ~l)/dynamic range), where n is the 
number of bits representing the quantized value. Again 
for example, the value in lookup table 460 for (rate 1, 
o>i) is (15/0.05) or 300. 
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The output from multiplier 464 is a value between 0 
and which is provided to bit mask 466. Bit mask 
466 in response to the rate and LSP frequency index 
extracts from the input value the appropriate number of 
bits according to Table VI. The extracted bits are the n 5 
integer value bits of the input value so as to provide a bit 
limited output Ao>,. The values Aa>/ are the quantized 
unbiased differentially encoded LSP frequencies that 
are transmitted over the channel representative of the 
LPC coefficients. 10 

The value Aa>/ is also fed back through a predictor 
comprised of inverse quantizer 468, adder 470, buffer 
472 and multiplier 474. Inverse quantizer 468 is com- 
prised of step size lookup table 476, minimum dynamic 
range lookup table 478, multiplier 480 and adder 482. 15 

The value Aco/is input to multiplier 480 along with a 
selected value from lookup table 476. Lookup table 476 
contains values corresponding to the step size for each 
LSP value at each rate in accordance with the values set 
forth in Table VI. The value output from lookup table 20 
476 is selected by the rate and LSP frequency index i. 
For each rate and LSP frequency index i the value 
stored in lookup table 460 is the quantity,, (dynamic 
range/2 n ~ l ), where n is the number of bits representing 
the quantized value. Multiplier 480 multiplies the input 25 
values and provides an output to adder 482. 

Adder 482 receives as another input a value from 
lookup table 478. The value output from lookup table 
478 is determined by the rate and LSP frequency index 
i in accordance with the minimum dynamic range val- 30 
ues, disregarding the value sign, set forth in Table VI. 
Adder 482 adds the minimum dynamic range value 
provided by lookup table 478 with the value output 
from multiplier 480 with resulting value output to adder 
470. 35 

Adder 470 receives as another input the predictor 
value output from multiplier 474. These values are 
added in adder 470 and stored in ten word storage 
buffer 472. Each value previous frame value output 
from buffer 472 during the current frame is multiplied in ^ 
multiplier 474 by a constant, 0.9. The predictor values 
as output from multiplier 474 are provided to both ad- 
ders 452 and 470 as previously discussed. 

In the current frame the value stored in buffer 472 is 
the previous frame reconstructed LSP values minus the 45 
bias value. Similarly in the current frame the value 
output from adder 470 is the current frame recon- 
structed LSP values also without bias. In the current 
frame the output from buffer 472 and adder 470 are 
respectively provided to adders 484 and 486 where the 
bias is added into the values. The values output from 
adders 484 and 486 are respectively the previous frame 
reconstructed LSP frequency values and the current 
frame reconstructed LSP frequency values. LSP 
smoothing is done at the lower rates according to the 
equation: 

Smoothed LSP= ©(current LSP) + ( 1 -fl)(previous 

LSP) (24) 

where 60 
a=0 for full rate; 
a =0.1 for half rate; 
a =0.5 for quarter rate; and 
a =0.8 5 for eighth rate. 

The previous frame (f— 1) reconstructed LSP fre- 65 
quency a>'//_ 1 values and the current frame (f) recon- 
structed LSP frequency values 0'// are output from 
quantization subsystem 210 to pitch subframe LSP in- 



55 



terpolation subsystem 216 and codebook subframe LSP 
interpolation subsystem 226. The quantized LSP fre- 
quency values An,- are output from LSP quantization 
subsystem 210 to data assembler subsystem 236 for 
transmission. 

The LPC coefficients used in the weighting filter and 
the formant synthesis filter described later are appropri- 
ate for the pitch subframe which is being encoded. For 
pitch subframes, the interpolation of the LPC coeffici- 
ents is done once for each pitch subframe and are as 
follows in Table VII: 

TABLE VII 

Rate I: »/ = 0.75«'//-l + 0.25ft)',/ for pitch subframe 1 

<i>/ = 0.5<i)'//_ i + 0.5(ii' t \f for pitch subframe 2 

<«>/ = 0.2 So'//- 1 + 0.750'// for pitch subframe 3 
<*l •=» <*'ij 

Rate J: ti>/ = 0.625cd' //_ i -I- 0.375ft>'// for pitch subframe 1 

o>/ = 0.125ft)'//_i + 0.875»V/ for pitch subframe 2 

Rate J: = 0.625ft)'//- 1 + O.375o)'// for pitch subframe 1 
Rate J: Pitch Search is not done. 

Pitch subframe counter 224 is used to keep track of 
the pitch subframes for which the pitch parameters are 
computed, with the counter output provided to pitch 
subframe LSP interpolation subsystem 216 for use in the 
pitch subframe LSP interpolation. Pitch subframe 
counter 224 also provides an output indicative of a 
completion of the pitch subframe for the selected rate to 
data packing subsystem 236. 

FIG. 13 illustrates an exemplary implementation of 
pitch subframe LSP interpolation subsystem 216 for 
interpolating the LSP frequencies for the relevant pitch 
subframe. In FIG. 13, the previous and current LSP 
frequencies <u'//-i and o>7/are respectively output from 
LSP quantization subsystem to multipliers 500 and 502 
where respectively multiplied by a constant provided 
from memory 504. Memory 504 stores a set of constant 
values and in accordance with an input of the pitch 
subframe number from a pitch subframe counter, dis- 
cussed later, provides an output of constants as set forth 
in Table VII for multiplication with the previous and 
current frame LSP values. The outputs of multipliers 
500 and 502 are added in adder 506 to provide the LSP 
frequency values for the pitch subframe in accordance 
with the equations of Table VII. For each pitch sub- 
frame, once the interpolation of LSP frequencies is 
accomplished a reverse LSP to LPC transformation is 
performed to obtain the current coefficients of A(z) and 
the perceptual weighting filter. The interpolated LSP 
frequency values are thus provided to LSP to LPC 
transformation subsystem 218 of FIG. 7. 

LSP to LPC transformation subsystem 218 converts 
the interpolated LSP frequencies back into LPC coeffi- 
cients for use in resynthesizing the speech. Again, the 
previously reference article "Line Spectrum Pair (LSP) 
and Speech Data Compression", by Soong and Juang 
provides a full discussion and derivation of the algo- 
rithm implemented in the present invention in the trans- 
formation process. The computational aspects are such 
that P(z) and Q(z) can be expressed in terms of the LSP 
frequencies by the equations: 



(25) 



m = (1 + 2" 1 ) n (1 - 2cos(a>2/-])z- 
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where w/ are the roots of the P' polynomial (odd fre- l^i^S, 1 ^j=2i-f 1, P(j)=0 for j<l. Circuit portion 
quencies), and 520 is duplicated (not shown) for computing the coeffi- 

cients of the Q polynomial. The resultant final new 
(26) values of P(l)-P(ll) and Q(l)-Q(ll) are provided to 
QM = (l - I-') n<i- 200^2,)*- ' + z-2) 5 circuit portion 524. 

[= i Circuit portion 524 is provided for completion of the 

computation of the pitch subframe ten LPC coefficients 
where w/are the roots of the Q' polynomial (even fre- a/ for i= 1 to i= 10. Circuit portion 524 is comprised of 
quencies), and buffers 525 and 526; adders 527, 528 and 529; and di- 

10 vider or bit shifter 530. The final P(i) and Q(i) values are 
_ K*> + (Kz) (27) stored in buffers 525 and 526. The P(i) and P(i+ 1) val- 

2 ues are summed in adder 527 while the corresponding 

Q(i) and Q(i + 1) values are subtracted in adder 528, for 
The computation is performed by first computing the 1 10. The output of adders 527 and 528, respect- 
values 2 cos (o>i) for all of the odd frequencies i. This 15 ful j y and q( z ) m input to ^der 529 where 
computation is accomplished using a 5th order single summed and output as the value (P(z)-fQ(z)). The out- 
precision Taylor Series expansion of cosine about zero put of adder is divided by two by shifting the bits by one 
(0). A Taylor expansion about the closest point in the position. Each bit shifted value of (P(z)+Q(z))/2 is an 
cosine table could potentially be more accurate, but the outpm LPC coefficient a,-. The pitch subframe LPC 
expansion about 0 achieves sufficient accuracy and does 20 coefflcients are provid ed to pitch search subsystem 220 
not involve an excessive amount of computation. Q ^ j 

Next the coefficients of the P polynomial are com- The L g p frequencies m ^ interpolated for each 
puted. The coefficients of a product of polynomials is codebook subframe as determined by the selected rate, 
the convolution of the sequences of coefficients of the t for fu]1 rate Jhe int lation is computed in a 

individual polynomials. The convolution of the 6 se- 25 tQ ^ of ^ Uch subframe Lsp 

quences of z polynomial [ coefficients m equation (25) mt lations . The code book subframe LSP interpola- 
tor {1, -2 cos (*>,) 1} 0, -2 cos (0,3), 1}. . . {1, tion$ are ted in codebook subframe LSP interpo- 
-2 cos (ai9). 1}, and {1,1}, is then computed. ^ ^ ^ M L$p Lpc 

Once the P polynomial is computed, the same proce- _ J . , _ 0 r , t c . 

, . *\ / ~ . *-J„,u^*ul£ c * m transformation subsystem 228 where transformation is 

dure is repeated for the Q polynomial where the 6 se- 30 ' ... . _ T OT> T 

c i ■ 1 1 «■ • * * m£\ computed in a manner similar to that of LSP to LPC 

quences of z polynomial coefficients in equation (26) *1 . , ^ _ 0 

\ r-i i f \ it / 1 o \ \\ X\ transformation subsystem 218. 

above, 11, —2 cos (02), l/> \1» —2 cos (0)4), 1/. . . 11, A . - 4 rT „ - - , 

-2 cos (<o,o), 1}, and {1 -1}, and the appropriate As discussed with reference to FIG. 3, the pitch 

coefficients are summed and divided by 2, i.e. shifted by searc \ h 1S « f^f b * s y nthesis technique in which 

1 bit, to produce the LPC coefficients. 35 encoding * done by selecting parameters which mini- 

FIG. 13 further shows an exemplary implementation the error between the input speech and the speech 
of the LSP to LPC transformation subsystem in detail. synthesized usmg those parameters. In the pitch search, 
Circuit portion 508 computes the value of -2 cos (<o,) th f s P eech IS synthesized using the pitch synthesis filter 
from the input value of Circuit portion 508 is com- whose response is expressed in equation (2). Each 20 
prised of buffer 509; adders 510 and 515; multipliers 511, 40 msec * s P eech frame 1S subdivided into a number of pitch 
512, 514, 516 and 518; and registers 513 and 515. In subframes which, as previously described, depends on 
computing the values for -2 cos («£ registers 513 and the *** rate chosen for the frame - Once per pitch sub- 
515 are initialized to zero. Since this circuit computes frame > the Parameters b and L, the pitch gam and lag, 
sin (o>,), o>;is first subtracted in adder 510 from the input respectively, are calculated. In the exemplary imple- 
constant value tt/2. This value is squared by multiplier 45 mentation herem the pitch lag L ranges between 17 and 
511 and then the values (ir/2-wd 2 , (?r/2-o>/) 4 , 143, for transmission reasons L== 16 is reserved for the 
{it/2— o)/) 6 , and (7r/2— oj,) 8 are successively computed csse wnen bo- 
using multiplier 512 and register 513. ^e speech coder utilizes a perceptual noise 

The Taylor series expansion coefficients c[l]-c[4] are weighting filter of the form set forth in equation (1). As 

successively fed into multiplier 514 along with the val- 50 mentioned previously the purpose of the perceptual 

ues output from multiplier 512. The values output from weighting filter is to weight the error at frequencies of 

multiplier 514 are input to adder 515 where along with less power to reduce the impact of error related noise, 

the output of register 516 the values are summed to The perceptual weighting filter is derived from the 

provide the output c[l](7r/2-o>,) 2 +c[2](ir/2- short term prediction filter previously found. The LPC 

-a) ( ) 4 H-ct3](7r/2-o>/) 6 +c[4](7r/2— a),) 8 to multiplier 55 coefficients used in the weighting filter, and the formant 

517. The input to multiplier 517 from register 516 is synthesis filter described later, are those interpolated 

multiplied in multiplier 517 with the output (7r/2-o>/) values appropriate for the subframe which is being 

from adder 510. The output from multiplier 517, the encoded. 

value cos (a>i), is multiplied in multiplier 518 with the In performing the analysis-by-synthesis operations, a 

constant —2 so as to provide output —2 cos (00,). The 60 copy of the speech decoder/synthesizer is used in the 

value —2 cos (o>,) is provided to circuit portion 520. encoder. The form of the synthesis filter used in the 

Circuit portion 520 is used in the computation of the speech encoder is given by equations (3) and (4). Equa- 

coefficients of the P polynomial. Circuit portion 520 is tions (3) and (4) correspond to a decoder speech synthe- 

comprised of memory 521, multiplier 522, and adder sis filter followed by the perceptual weighting filter, 

523. The array of memory locations P(l) . . . P(ll) is 65 therefore called the weighted synthesis filter, 

initialized to 0 except for P(l) which is set to 1. The old The pitch search is performed assuming a zero contri- 

indexed —2 cos (w/) values are fed into multiplier 524 to bution from the codebook at the current frame, i.e. 

perform the convolution of (1, —2 cos (w/), 1) where G=0. For each possible pitch lag, L, the speech is 
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synthesized and compared with the original speech. 
The error between the input speech and the synthesized 
speech is weighted by the perceptual weighting filter 
before its mean square error (MSE) is calculated. The 
objective is to pick values of L and b, from all possible 
values of L and b, which minimize the error between 
the perceptually weighted speech and the perceptually 
weighted synthesized speech. The minimization of the 
error may be expressed by the following equation: 



As discussed previously with respect to equation (28), 
the objective is to minimize the error between x(n), the 
perceptually weighted speech minus the zero input 
response (ZIR) of the weighted formant filter, and x'(n), 
the perceptually weighted synthesized speech given no 
memory in the filters, over all possible values of L and 
b, given zero contribution from the stochastic codebook 
(G=0). Equation (28) can be rewritten with respect to 
b where: 



10 



MSE~> 



n=0 



(28) 



L P -\ 



where Lp is the number of samples in the pitch sub- 15 
frame, which in the exemplary embodiment is 40 for a 
full rate pitch subframe. The pitch gain, b, is computed 
which minimizes the MSE. These calculations are re- 
peated for all allowed values of L, and the L and b that 
produce the minimum MSE are chosen for the pitch 2 o 
filter. 

Calculating the optimal pitch lag involves the for- 
mant residual (p(n) in FIG. 3) for all time between 
n = - Lmax to n = (Lp- L m in) - 1 where L max is the max- 
imum pitch lag value, L m i n is the ininimum pitch lag 25 
value and Lp is the pitch subframe length for the se- 
lected rate, and where n=0 is the start of the pitch 
subframe. In the exemplary embodiment L max — 14-3 and 
Lmin—M. Using the numbering scheme provided in 
FIG. 14, for rate i n=-143 to n=142; for rate J, 30 
n — — 143 to n = 62; and for rate 1, n= — 143 to n=22. 
For n<0, the formant residual is simply the output of 
the pitch filter from the previous pitch subframes, 
which is held in the pitch filter memory, and is referred 
to as the closed loop formant residual. For n^O, the 35 
formant residual is the output of a formant analysis filter 
having a filter characteristic of A(z) where the input is 
the current analysis frame speech samples. For n ^0, the 
formant residual is referred to as the open loop formant 
residual and would be exactly p(n) if the pitch filter and 40 
codebook do a perfect prediction at this subframe. Fur- 
ther explanation of the computation of the optimum 
pitch lag from the associated formant residual values is 
provided with reference to FIGS. 14-17. 

The pitch search is done over 143 reconstructed 45 
closed-loop formant residual samples, p(n) for n<0, 
plus Lp— L m/ „unquantized open-loop formant residual 
samples, p 0 (n) for n^0. The search effectively changes 
gradually from mostly an open-loop search where L is 
small and thus most of the residual samples used are 50 
n>0, to a mostly closed-loop search where L is large 
and thus all of the residual samples used are n<0. For 
example, using the numbering scheme provided in FIG. 
14 at full rate, where the pitch subframe is comprised of 
40 speech samples, the pitch search begins using the set 55 
of formant residual samples numbered n= — 17 to 
n=22. In this scheme from n= — 17 to n= — 1, the sam- 
ples are closed-loop formant residual samples while 
from n=0 to n=22 the samples are open-loop formant 
residual samples. The next set of formant residual sam- 60 
pies used in determining the optimum pitch lag are the 
samples numbered n = — 18 to n=21. Again, from 
n = — 18 to n = — 1, the samples are closed-loop formant 
residual samples while from n=0ton=21 the samples 
are open-loop formant residual samples. This process 65 
continues through the sample sets until the pitch lag is 
computed for the last set of formant residual samples, 
n=~143 ton= — 104. 



MSE = -f— 2 (*«) - 6*n)) 2 
where, 

y(n) » h{n)*p\n - L) for 0 * n ^ Lp - 1 



(29) 



(30) 



where y(n) is the weighted synthesized speech with 
pitch lag L when b = 1, and h(n) is the impulse response 
of the weighted formant synthesis filter having the filter 
characteristic according to equation (3). 

This minimization process is equivalent to maximiz- 
ing the value El where: 



E L = 



where, 

Lf>-\ 



and, 



n=0 



E yy = 2 v(/>Mn) 



The optimum b for the given L is found to be: 



b L = - 



(3D 



(32) 



(33) 



(34) 



This search is repeated for all allowed values of L. 
The optimum b is restricted to be positive, so L result- 
ing in any negative B xy is ignored in the search. Finally 
the lag, L, and the pitch gain, b, that maximize El are 
chosen for transmission. 

As mentioned previously, x(n) is actually the percep- 
tually weighted difference between the input speech 
and the ZIR of the weighted formant filter because for 
the recursive convolution, set for below in equations 
(35)-(38), the assumption is that the filter A(z) always 
starts with 0 in the filter memory. However the filter 
starting with a 0 in the filter memory is not actually the 
case. In synthesis, the filter will have a state remaining 
from the previous subframe. In the implementation, the 
effects of the initial state are subtracted from the per- 
ceptually weighted speech at the start. In this way, only 
the response of the steady-state filter A(z), all memories 
initially =0, to p(n) needs to be calculated for each L, 
and recursive convolution can be used. This value of 
x(n) needs to be computed only once but y(n), the zero 
state response of the formant filter to the output of the 
pitch filter, needs to be computed for each lag L. The 
computation of each y(n) involves many redundant 
multiplications, which do not need to be computed each 
lag. The method of recursive convolution described 
below is used to minimize the computation required. 
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With respect to recursive convolution the value y£,(n) 
is defined by the value y(n) where: 



or, 



yL (n)=h(n)*p(n-L)17§L£ 143 



y L (n) = 2h(i)p(n-L-i) 1 7 ^ L S 1 43 



(35) 



(36) 



From equations (32) and (33) it can be seen that: 
yz.<0)- P (-L)h(0) 

JfcOO-M.- l)+/K-^)A(n)l<«<^I7<L< 143 



10 



(37) 



(38) 



In this way once the initial convolution for yn(n) is 15 
done, the remaining convolutions can be done recur- 
sively, greatly decreasing the number of computations 
required. For the example given above for rate 1, the 
value yn(n) is computed by equation (36) using the set 
of formant residual samples numbered n= — 17 to 20 
n = 22. 

Referring to FIG. 15, the encoder includes a dupli- 
cate of the decoder of FIG. 5, decoder subsystem 235 of 
FIG. 7, absent the adaptive postfilter. In FIG. 15 the 
input to the pitch synthesis filter 550 is the product of 25 
the codebook value cy(n) and the codebook gain G. The 
output formant residual samples p(n) are input to for- 
mant synthesis filter 552 where filtered and output as 
reconstructed speech samples s'(n). The reconstructed 
speech samples s'(n) are subtracted from the corre- 30 
sponding input speech samples s(n) in adder 554. The 
difference between the samples s(n)' and s(n) are input 
to perceptual weighting filter 556. With respect to pitch 
synthesis filter 550, formant synthesis filter 552 and 
perceptual weighting filter 556, each filter contains a 35 
memory of the filter state where: M p is the memory in 
the pitch synthesis filter 550; M fl is the memory in the 
formant synthesis filter 552; and M w is the memory in 
the perceptual weighting filter 556. 

The filter state M fl from decoder subsystem formant 40 
synthesis filter 552 is provided to pitch search subsys- 
tem 220 of FIG. 7. In FIG. 16 the filter state M a is 
provided to calculate the zero input response (ZIR) of 
filter 560 which computes the ZIR of formant synthesis 
filter 552. The computed ZIR value is subtracted from 45 
the input speech samples s(n) in adder 562 with the 
result weighted by perceptual weighting filter 564. The 
output from perceptual weighting filter 564, x^(n), is 
used as the weighted input speech in equations (28)-(34) 
where x(n)=x^n). 50 

Referring back to FIGS. 14 and 15, pitch synthesis 
filter 550 as illustrated in FIG. 14 provides to adaptive 
codebook 568 which is in essence a memory for storing 
the closed and open loop formant residual samples 
which were computed as discussed above. The closed 55 
loop formant residual is stored in memory portion 570 
while the open loop formant residual is stored in mem- 
ory portion 572. The samples are stored according to 
the exemplary numbering scheme as discussed above. 
The closed loop formant residual is organized as dis- 60 
cussed above with respect to usage for each pitch lag L 
search. The open loop formant residual is computed 
from the input speech samples s(n) for each pitch sub- 
frame using the formant analysis filter 574 which uses 
the decoder subsystem formant synthesis filter 552 65 
memory M a in computing the values of p 0 (n). The val- 
ues of p 0 (n) for the current pitch subframe are shifted 
through a series of delay elements 576 for providing to 



memory portion 572 of adaptive codebook 568. The 
open loop formant . residuals are stored with the first 
residual sample generated numbered as 0 and the last 
numbered 142. 

Referring now to FIG. 16, the impulse response h(n) 
of the formant filter is computed in filter 566 and output 
to shift register 580. As discussed above with respect to 
the impulse response of the formant filter h(n), equa- 
tions (29)-(30) and (35)-(38), these values are computed 
for each pitch subframe in filter. To further reduce the 
computational requirements of the pitch filter subsys- 
tem, the impulse response of the formant filter h(n) is 
truncated to 20 samples. 

Shift register 580 along with multiplier 582, adder 584 
and shift register 586 are configured to perform the 
recursive convolution between the values h(n) from 
shift register 580 and the values c(m) from adaptive 
codebook 568 as discussed above. This convolution 
operation is performed to find the zero-state response 
(ZSR) of the formant filter to the input coming from the 
pitch filter memory, assuming that the pitch gain is set 
to 1. In operation of the convolution circuitry, n cycles 
from hp to 1 for each m while m cycles from 
(L^— 17)— 1 to —143. In register 586 data is not for- 
warded when n=l and data is not latched in when 
n = Lp. Data is provided as an output from the convolu- 
tion circuitry when m^ — 17. 

Following the convolution circuitry is correlation 
and comparison circuitry which performs the search to 
find the optimal pitch lag L and pitch gain b. The corre- 
lation circuitry, also referred to as the mean square 
error (MSE) circuitry, computes the auto and cross- 
correlation of the ZSR with the perceptually weighted 
difference between the ZIR of the formant filter and the 
input speech, i.e. x(n). Using these values, the correla- 
tion circuitry computes the value of the optimal pitch 
gain b for each value of the pitch lag. The correlation 
circuitry is comprised of shift register 588, multipliers 
590 and 592, adders 594 and 596, registers 598 and 600, 
and divider 602. In the correlation circuitry computa- 
tions are such that n cycles from L^to 1 while m cycles 
from (L^-17)-l to -143. 

The correlation circuitry is followed by comparison 
circuitry which performs the comparisons and stores 
the data in order to determine the optimum value of 
pitch lag L and gain b. The comparison circuitry is 
comprised of multiplier 604; comparator 606; registers 
608, 610 and 612; and quantizer 614. The comparison 
circuitry outputs for each pitch subframe the values for 
L and b which minimize the error between the synthe- 
sized speech and the input speech. The value of b is 
quantized into eight levels by quantizer 614 and repre- 
sented by a 3-bit value, with an additional level, b=0 
level being inferred when L = 16. These values of L and 
b are provided to codebook search subsystem 230 and 
data buffer 222. These values are provided via data 
packing subsystem 238 or data buffer 222 to decoder 
234 for use in the pitch search. 

Like the pitch search, the codebook search is an anal- 
ysis by synthesis coding system, in which encoding is 
done by selecting parameters which minimize the error 
between the input speech and the speech synthesized 
using those parameters. For rate J, the pitch gain b is set 
to zero. 

As discussed previously, each 20 msec, is subdivided 
into a number of codebook subframes which, as previ- 
ously described, depends upon the data rate chosen for 
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MSE = 



\ 



(Bxy) 2 



the frame. Once per codebook subframe, the parameters 
G and I, the codebook gain and index, respectively, are 
calculated. In the calculation of these parameters the 
LSP frequencies are interpolated for the subframe, ex- 
cept for full rate, in codebook subframe LSP interpola- 5 
tion subsystem 226 in a manner similar to that described 
with reference to pitch subframe LSP interpolation 
subsystem 216. The codebook subframe interpolated 
LSP frequencies are also converted to LPC coefficients 
by LSP to LPC transformation subsystem 228 for each 10 
codebook subframe. Codebook subframe counter 232 is 
used to keep track of the codebook subframes for which 
the codebook parameters are computed, with the 
counter output provided to codebook subframe LSP 
interpolation subsystem 226 for use in the codebook 15 
subframe LSP interpolation. Codebook subframe 
counter 232 also provides an output, indicative of a 
completion of a codebook subframe for the selected 
rate, to pitch subframe counter 224. 

The excitation codebook consists of 2 M code vectors 20 
which are constructed from a unit- variant white Gauss- 
ian random sequence. There are 128 entries in the code- 
book for M=7. The codebook is organized in a recur- 
sive fashion such that each code vector differs from the 
adjacent code vector by one sample; that is, the samples 25 where, 
in a code vector are shifted by one position such that a 
new sample is shifted in at one end and a sample is L c- 1 

dropped at the other. Therefore a recursive codebook 
can be stored as a linear array that is 2 A/ +(Lc— 1) long 
where Lc is the codebook subframe length. However, 30 
to simplify the implementation and to conserve memory Lc _ l 

space, a circular codebook 2 M samples long (128 sam- 
ples) is used. 

To reduce calculations, the gaussian values in the 
codebook are center-clipped. The values are originally 35 
chosen from a white gaussian process of variance 1. 
Then, any value with magnitude less than 1.2 is set to 
zero. This effectively sets about 75% of the values to 
zero, producing a codebook of impulses. This center- 
clipping of the codebook reduces the number of multi- 40 
plications needed to perform the recursive convolution 
in the codebook search by a factor of 4, since multiplica- 
tions by zero need not be performed. The codebook 
used in the current implementation is given below in 
Table VII. 45 
TABLE Vin 



As stated previously, the objective is to minimize the 
error between x(n) and x'(n) over all possible values of 
I and G. The minimization of the error may be ex- 
pressed by the following equation: 



L c -\ 

MSE = -f— 2 (x(n) - x>(n)f 



(39) 



where Lc is the number of samples in the codebook 
subframe. Equation (38) may be rewritten with respect 
to G where: 



LC-1 
n=0 



(40) 



where y is derived by convolving the impulse response 
of the formant filter with the Ith Code vector, assuming 
that G=l. Minimizing the MSE is, in turn, equivalent 
to maximizing: 



Exy = 2 An)y{n) 
n=\J 



and 



E yy = X y(nMn) 



(41) 



(42) 



(43) 



The optimum G for the given I is found according to 
the following equation: 



(44) 



This search is repeated for all allowed values of I. In 
contrast to the pitch search, the optimum gain, G, is 
allowed to be both positive or negative. Finally the 
index, I, and the codebook gain, G, that maximize E/are 
chosen for transmission. 
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Again, the speech coder utilizes a perceptual noise 
weighting filter of the form set forth in equation (1) 
which includes a weighted synthesis filter of the form 
set forth in equation (3). For each codebook index, I, the 65 
speech is synthesized and compared with the original 
speech. The error is weighted by the perceptual 
weighting filter before its MSE is calculated. 



Again it should be noted that x(n), the perceptually 
weighted difference between the input speech and the 
ZIR of the weighted pitch and formant filters, needs to 
be computed only once. However, y(n), the zero state 
response of the pitch and formant filters for each code 
vector, needs to be computed for each index I. Because 
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a circular codebook is used, the method of recursive ters 658, 660 and 662; and quantizer 664. The compari- 
convolution described for pitch search can be used to son circuitry provides for each codebook subframe the 
minimize the. computation required. values for I and G which minimize the error between 

Referring again to FIG. 15, the encoder includes a the synthesized speech and the input speech. The code- 
duplicate of the decoder of FIG. 5, decoder subsystem 5 book gain G is quantized in quantizer 614 which DPCM 
235 of FIG. 7 in which the filter states are computed codes the values during quantization in a manner similar 
wherein: M p is the memory in the pitch synthesis filter to the bias removed LSP frequency quantization and 
550; Mo is the memory in the formant synthesis filter coding as described with reference to FIG. 12. These 
552; and M w is the memory in the perceptual weighting values for I and G are then provided to data buffer 222. 
filter 556. 10 In the quantization and DPCM encoding of the code- 

The filter states M p and M fl , respectively from de- book gain G is computed in accordance with the fol- 
coder subsystem pitch synthesis and formant filters 550 lowing equation: 
and 552 (FIG. 15) are provided to codebook search 

subsystem 230 of FIG. 7. In FIG. 17, the filter states M p Quantized C/=20 log c,-o.45(20 log G,-_ , +20 log 

and M 0 are provided to zero impulse response (ZIR) 15 G '-2) < 45 ) 

filter 620 which computes the ZIR of pitch and formant 

synthesis filters 550 and 552. The computed ZIR of the where 20 lo S G '- 1 and 2( > lo S G '-2 are the respective 
pitch and formant synthesis filters is subtracted from the values computed for the immediately previous frame 
input speech samples s(n) in adder 622 with the result 0- D an d the frame preceeding the immediately previ- 
weighted by the perceptual weighting filter 624. The 20 ous frame 0-2). 

output from perceptual weighting filter 564, Xc(n), is The L SP, I, G, L and b values along with the rate are 
used as the weighted input speech in the above MSE provided to data packing subsystem 236 where the data 
equations (39)-(44) where x(n)=Xc(n). is arranged for transmission. In one implementation the 

FIG. 17, the impulse response h(n) of the formant LSP, I, G, L and b values along with the rate may be 
filter is computed in filter 626 and output to shift regis- 25 provided to decoder 234 via data packing subsystem 
ter628. The impulse response of the formant filter h(n), 236. In another implementation these values may be 
is computed for each codebook subframe. To further provided via data buffer 222 to decoder 234 for use in 
reduce the computational requirements, the impulse the pitch search. However in the preferred embodiment 
response h(n) of the formant filter is truncated to 20 protection of the codebook sign bit is employed within 
samples. 30 data packing subsystem 236 which may affect the code- 

Shift register 628 along with multiplier 630, adder 632 book index. Therefore this protection must be taken 
and shift register 634 are configured to perform the into account should I and G data be provided directly 
recursive convolution between the values h(n) from from data buffer 222. 

shift register 628 and the values c(m) from codebook In data packing subsystem 236 the data may be 
636 which contains the codebook vectors as discussed 35 packed in accordance with various formats for trans- 
above. This convolution operation is performed to find mission. FIG. 18 illustrates an exemplary embodiment 
the zero-state response (ZSR) of the formant filter to of the functional elements of data packing subsystem 
each code vector, assuming that the codebook gain is 236. Data packing subsystem 236 is comprised of pseu- 
set to 1. In operation of the convolution circuitry, n drandom generator (PN) 670, cyclic redundancy check 
cycles from Lc to 1 for each m, while m cycles from 1 40 (CRC) computational element 672, data protection 
to 256. In register 586 data is not forwarded when n = 1 logic 674 and data combiner 676. PN generator 670 
and data is not latched in when n=Lc Data is provided receives the rate and for eighth rate generates a 4-bit 
as an output from the convolution circuitry when m < 1. random number that is provided to data combiner 676. 
It should be noted that the convolution circuitry must CRC element 672 receives the codebook gain and LSP 
be initialized to conduct the recursive convolution op- 45 values along with the rate, and for full rate generates an 
eration by cycling m subframe size times before starting 1 1 -bit internal CRC code that is provided to data com- 
the correlation and comparison circuitry which follow biner 676. 

the convolution circuitry. Data combiner 674 receives the random number; 

The correlation and comparison circuitry conducts CRC code; and along with the rate and LSP, I, G, L 
the actual codebook search to yield the codebook index 50 and b values from data buffer 222 (FIG. lb) provides an 
I and codebook gain G values. The correlation cir- output to transmission channel data processor subsys- 
cuitry, also referred to as the mean square error (MSE) tern 234. In the implementation where the data is pro- 
circuitry, computes the auto and cross-correlation of . vided directly from data buffer 222 to decoder 234 at a 
the ZSR with the perceptually weighted difference minimum the PN generator 4-bit number is provided 
between the ZIR of the pitch and formant filters, and 55 from PN generator 670 via data combiner 676 to de- 
the input speech x'(n). In other words the correlation coder 234. At full rate the CRC bits are included along 
circuitry computes the value of the codebook gain G with the frame data as output from data combiner 674, 
for each value of the codebook index I. The correlation while at eighth rate the codebook index value is 
circuitry is comprised of shift register 638, multipliers dropped and replaced by the random 4-bit number. 
640 and 642, adders 644 and 646, registers 648 and 650, 60 In the exemplary embodiment it is preferred that 
and divider 652. In the correlation circuitry computa- protection be provided to the codebook gain sign bit. 
tions are such that n cycles from Lcto 1 while m cycles Protection of this bit is to make the vocoder decoder 
from 1 to 256. less sensitive to a single bit error in this bit. If the sign bit 

The correlation circuitry is followed by comparison were changed due to an undetected error, the codebook 
circuitry which performs the comparisons and storing 65 index would point to a vector unrelated to the optimum, 
of data in order to determine the optimum value of In the error situation without protection, the negative 
codebook index I and gain G. The comparison circuitry of the optimum vector would be selected, a vector 
is comprised of multiplier 654; comparator 656; regis- which is in essence the worst possible vector to be used. 
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The protection scheme employed herein ensures that a Further details on the modulation employed in a 
single bit error in the gain sign bit will not cause the CDMA system in which the vocoder of the present 
negative of the optimum vector to be selected in the invention is to be employed are disclosed in copending 
error situation. Data protection logic 674 receives the U.S. patent application Ser. No. 07/543,496, filed Jun. 
codebook index and gain and examines the sign bit of 5 25, 1990, and entitled "SYSTEM AND METHOD 
the gain value. If the gain value sign bit is determined to FOR GENERATING SIGNAL WAVEFORMS IN 
be negative the value 89 is added, mod 128, to the asso- A CDMA CELLULAR TELEPHONE SYSTEM", 
ciated codebook index. The codebook index whether or assigned to the Assignee of the present invention. In this 
not modified is output from data protection logic 674 to system at rates other than full rate a scheme is employed 
data combiner 676. 1° in which the data bits are organized into groups with 

In the exemplary embodiment it is preferred that at the bit groups psuedorandomly positioned within the 20 
full rate, the most perceptually sensitive bits of the com- msec, data transmission frame. It should be understood 
pressed voice packet data are protected, such as by an that other frame rates and bit representations may 
internal CRC (cyclic redundancy check). Eleven extra readily be employed other than those presented for 
bits are used to perform this error detection and correc- 15 purposes of illustration herein with respect to the vo- 
tion function which is capable of correcting any single coder and the CDMA system implementation, such that 
error in the protected block. The protected block con- other implementations are available for the vocoder and 
sists of the most significant bit of the 10 LSP frequencies other system applications. 

and the most significant bit of the 8 codebook gain In the CDMA system, and also applicable to other 
values. If an uncorrectable error occurs in this block, 20 systems, processor subsystem 238 on a frame by frame 
the packet is discarded and an erasure, described later, is basis may interrupt transmission of vocoder data to 
declared. Otherwise, the pitch gain is set to zero but the transmit other data, such as signalling data or other 
rest of the parameters are used as received. In the exem- non-speech information data. This particular type of 
plary embodiment a cyclic code is chosen to have a transmission situation is referred to as "blank and 
generator polynomial of: 25 burst " Processor subsystem 238 essentially replaces the 

vocoder data with the desired transmission data for the 
g(x)=i+^3+^ 5 +^+Jc 8 +x 9 +^ 10 (46) frame. 

Another situation may arise where there is a desire to 
yielding a (31,21) cyclic code. However, it should be transmit both vocoder data and other data during the 
understood that other generator polynomials may be 30 same data transmission frame. This particular type of 
used. An overall parity bit is appended to make it a transmission situation is referred to as "dim and burst". 
(32,21) code. Since there are only 18 information bits, In a "dim and burst" transmission, the vocoder is pro- 
the first 3 digits in the code word are set to zero and not vided with rate bound commands which set the vo- 
transmitted. This technique provides added protection coder final rate at the desired rate, such as half rate. The 
such that if the syndrome indicates an error in these 35 half rate encoded vocoder data is provided to processor 
positions, it means there is an uncorrectable error. The subsystem 238 which inserts the additional data along 
encoding of a cyclic code in systematic form involves with the vocoder data for the data transmission frame, 
the computation of parity bits as xlO u(x) modulo g(x) An additional function provided for full-duplex tele- 
where u(x) is the message polynomial. phone links is a rate interlock. If one direction of the 

At the decoding end, the syndrome is calculated as 40 link is transmitting at the highest transmission rate, then 
the remainder from dividing the received vector by the other direction of the link is forced to transmit at the 
g(x). If the syndrome indicates no error, the packet is lowest rate. Even at the lowest rate, sufficient intelligi- 
accepted regardless of the state of the overall parity bit. bility is available for the active talker to realize that he 
If the syndrome indicates a single error, the error is is being interrupted and to stop talking, thereby allow- 
corrected if the state of the overall parity bit does not 45 ing the other direction of the link to assume the active 
check. If the syndrome indicates more than one error, talker role. Furthermore, if the active talker continues 
the packet is discarded. Further details on such an error to talk over an attempted interruption, he will probably 
protection scheme can be found in section 4.5 of "Error not perceive a degradation in quality because his own 
Control coding: Fundamentals and Applications" by speech "jams" the ability to perceive quality. Again by 
Lin and Costello for details of syndrome calculation. 50 using the rate bound commands the vocoder can be set 

In a CDMA cellular telephone system implementa- to vocode the speech at a lower than normal rate, 
tion the data is provided from data combiner 674 to It should be understood that the rate bound corn- 
transmission channel data processor subsystem 238 for mands can be used to set the vocoder maximum rate at 
data packing for transmission in 20 msec, data transmis- less than full rate when additional capacity in the 
sion frames. In a transmission frame in which the vo- 55 CDMA system is needed. In a CDMA system in which 
coder is set for full rate, 192 bits are transmitted for an a common frequency spectrum is used for transmission, 
effective bit rate of 9.6 kbps. The transmission frame in one users signal appears as interference to other users in 
this case is comprised of one mixed mode bit used to the system. System user capacity is thus limited by the 
indicate mixed frame type (0= voice only, 1 = voice and total interference caused by system users. As the level 
data/signaling); 160 vocoder data bits along with 11 60 of interference increases, normally due to an increase in 
internal CRC bits; 12 external or frame CRC bits; and 8 users within the system, a degradation in quality is expe T 
tail or flush bits. At half rate, 80 vocoder data bits are rienced by the users due to the increase in interference, 
transmitted along with 8 frame CRC bits and 8 tail bits Each user's contribution to interference in the 
for an effective bit rate of 4.8 kbps. At quarter rate, 40 CDMA system is a function of the user's transmission 
vocoder data bits are transmitted along with 8 tail bits 65 data rate. By setting a vocoder to encode speech at a 
for an effective bit rate of 2.4 kbps. Finally, at eighth lower than normal rate, the encoded data is then trans- 
rate 16 vocoder data bits are transmitted along with 8 mitted at the corresponding reduced transmission data 
tail bits for an effective bit rate of 1.2 kbps. rate, which reduces the level of interference caused by 
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that user. Therefore system capacity may be substan- 
tially increased by vocoding speech at a lower rate. As 
system demand increases, user vocoders may be com- 
manded by the system controller or cell base station to 
reduce encoding rate. The vocoder of the present in- 5 
vention is of a quality such that there is very little, 
although some, perceptable difference between speech 
encoded at full and half rate. Therefore the effect in 
quality of communications between system users where 
speech is vocoded at a lower rate, such as half rate, is 10 
less significant than that caused by an increasing level of 
interference which results from an increased number of 
users in the system. 

Various schemes may therefore be employed to set 
individual vocoder rate bounds for lower than normal 15 
vocoding rates. For example, all users in a cell may be 
commanded to encode speech at half rate. Such action 
substantially reduces system interference, with little 
effect in quality in communications between users, 
while providing a substantial increase in capacity for 20 
additional users. Until the total interference in the sys- 
tem is increased by the additional users to a level of 
degradation there is no impact in quality in communica- 
tions between users. 

As mentioned previously, the encoder includes a 25 
copy of the decoder in order to accomplish the analysis- 
by-synthesis technique in encoding the frames of speech 
samples. As illustrated in FIG. 7, decoder 234 receives 
the values L, b, I and I either via data packing subsys- 
tem 238 or data buffer 222 for reconstructing the syn- 30 
thesized speech for comparison with the input speech. 
The outputs from decoder are the values M p , M fl , and 
M w as discussed previously. Further details on decoder 
234 as used in the encoder and in reconstructing the 
synthesized speech at the other end of the transmission 35 
channel may be discussed together with reference to 

FIGS. 19-24. 

FIGrT9~i5~2~flow diagram for an exemplary imple- 
mentation of the decoder of the present invention. Due 
to a common structure of the decoder as implemented 40 
within the encoder, and at the receiver, these implemen- 
tations are discussed together. The discussion with re- 
spect to FIG. 19 is primarily concerned with the de- 
coder at the end of the transmission channel since data 
received thereat must be preprocessed in the decoder 45 
whereas in the encoder's decoder the appropriate data 
(rate, I, G, L and b) is received directly from data pack- 
ing subsystem 238 or data buffer 222. However, the 
basic function of the decoder is the same for both en- 
coder and decoder implementations. 50 

As discussed with reference to FIG. 5, for each code- 
book subframe, the codebook vector specified by the 
codebook index I is retrieved from the stored codebook. 
The vector is multiplied by the codebook gain G and 
then filtered by the pitch filter for each pitch subframe 55 
to yield the formant residual. This formant residual is 
filtered by the formant filter and then passed through an 
adaptive formant postfilter and a brightness postfilter, 
along with automatic gain control (AGC) to produce 
the output speech signal. 60 

Although the length of codebook and pitch subframe 
varies, decoding is done in 40 sample blocks for ease of 
implementation. The compressed data received is first 
unpacked into codebook gains, codebook indexes, pitch 
gains, pitch lags, and LSP frequencies. The LSP fre- 65 
quencies must be processed through their respective 
inverse quantizers and DPCM decoders as discussed 
with reference to FIG. 22. Similarly the codebook gain 



values must be processed in a similar manner to the LSP 
frequencies, except without the bias aspect. Also the 
pitch gain values are inverse quantized. These parame- 
ters are then provided for each decoding subframe. In 
each decoding subframe, 2 sets of codebook parameters 
(G & I), 1 set of pitch parameters (b & L), and 1 set of 
LPC coefficients are needed to generate 40 output sam- 
ples. FIGS. 20 and 21 illustrate exemplary subframe 
decoding parameters for the various rates and other 
frame conditions. 

For full rate frames, there are 8 sets of received code- 
book parameters and 4 sets of received pitch parame- 
ters. The LSP frequencies are interpolated four times to 
yield 4 sets of LSP frequencies. The parameters re- 
ceived and corresponding subframe information is listed 
in FIG. 20a 

For half rate frames, each set of the four received 
codebook parameters is repeated once, each set of the 
two received pitch parameters is repeated once. The 
LSP frequencies are interpolated three times to yield 4 
sets of LSP frequencies. The parameters received and 
corresponding subframe information is listed in FIG. 
20b. 

For quarter rate frames, each set of the two received 
codebook parameters is repeated four times, the set of 
pitch parameters* is also repeated four times. The LSP 
frequencies are interpolated once to yield 2 sets of LSP 
frequencies. The parameters received and correspond- 
ing subframe information is listed in FIG. 20c. 

For eighth rate frames, the set of received codebook 
parameters is used for the entire frame. Pitch parame- 
ters are not present for eighth rate frames and the pitch 
gain is simply set to zero. The LSP frequencies are 
interpolated once to yield 1 set of LSP frequencies. The 
parameters received and corresponding subframe infor- 
mation is listed in FIG. 20d. 

-Occ asionally,, the voice^ oackets may be blanked out 
f i n order for the CDMA cell or mobile station to trans- 
mit signalling information. Wh en„the.v.ocoder.receives a 
blank frame, it con tinues with a sli ght modific ation to 
the-previous'ffame's parameters. The codebook gainis 
set -to'zeToT-T he previous trameVpitcrriag and~iain are 
used as the current frame pitch lag and gain except that 
the gain is limited to one or less. The previous frame's 
LSP frequencies are used as is without interpolation. 
Note that the encoding end and the decoding end are 
still synchronized and the vocoder is able to recover 
from a blank frame very quickly. The parameters re- 
ceived and corresponding subframe information is listed 
in FIG. 21a. 

In the event that a frame is lost due to a channel error, 
the vocoder attempts to mask this error by maintaining 
a fraction of the previous frame's energy and smoothly 
transitioning to background noise. In this case the pitch 
gain is set to zero; a random codebook is selected by 
using the previous subframe's codebook index plus 89; 
the codebook gain is 0.7 times the previous subframe's 
codebook gain. It should be noted that there is nothing 
magic about the number 89, this is just a convenient way 
of selecting a pseudorandom codebook vector. The 
previous frame's LSP frequencies are forced to decay 
toward their bias values as: 



ti)/=s0.9 (previous w/-bias value of o>/)+ bias value of 
ait. 



(47) 
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The LSP frequency bias values are shown in Table 5. and then the filter output is multiplied with this 

The parameters received and corresponding subframe smoothed inverse gain to produce the output speech, 

information is listed in FIG. 21b. In FIG. 19 the data from the channel along with the 

If the rate cannot be determined at the receiver, the rate, either transmitted along with the data or derived 

packet is discarded and an erasure is declared. How- 5 by other means is provided to data unpacking subsystem 

ever, if the receiver determines there is a strong likeli- 700. In an exemplary implementation for a CDMA 

hood the frame was transmitted at full rate, though with system a rate decision can be derived from the error rate 

errors the following is done. As discussed previously at * the received data when it is decoded at each of the 

full rate, the most perceptually sensitive bits of the com- different rates. In data unpacking subsystem 700, at full 

pressed voice packet data are protected by an internal 10 rate , a °[ th ( e CRC is made for errors with the 

CRC. At the decoding end, the syndrome is calculated of th * ch c ec u k provld * d A to sub J rame d * ta unpa( * 

as the remainder from dividing the received vector by subsystem 702. Subsystem 700 provides an indication of 

g(x), from equation (46). If the syndrome indicates no * b ™™* frame con f dmons s " ch « » ^^J™" 
I™. ^ <Lv~* 1~J^*»a rort o^u e c tu 0 eto** rt f sure frame or error frame with usable data to subsystem 
error, the packet is ; aooqpted regardless of the state of Subsystem 700 provides the rate along with the 

the overall parity tat If the syndrome mdica es a single te * j Q u ^ b for the frame to S subsystera 

error, the error is corrected if the state of the overall ^ ^ ^ CQdebook mdex l ^ ^ G 

parity bit does not check. If the syndrome indicates ^ ^ si b * of ^ ^ ^ fe checked - n $ub . 
more than one error, the packet is discarded. Jfa^un- system JQ2 If ftc sign bk fe negativej the value 89 is 
correctable jrxoj-occurs m this block, the_packet is 2Q subtracted> mod 128 , from the associated codebook 
discarded-and_an erasure is declared. Otherwise^ mdex Furthe nnore in subsystem the codebook gain is 
pitch gain issetto^ero but the rest of the parameters are mverse quantize d and DPCM decoded, while the pitch 
used a s received with corrections, as ill ustrated in FIG. , j s mV erse quantized. 

2Jc ~ ■ ; — Subsystem 700 also provides the rate and the LSP 

The postfllters used in this implementation were first 2 5 frequencies to LSP inverse quantization/interpolation 
described in "Real-Time Vector APC Speech Coding subsystem 704. Subsystem 700 further provides an indi- 
At 4800 BPS with Adaptive postfiltering" by J. H. cation of a blank frame, erasure frame or error frame 
Chen et al., Proc. ICASSP, 1987. Since speech formants with usable data to subsystem 704. Decode subframe 
are perceptually more important than spectral valleys, counter 706 provides an indication of the subframe 
the postfilter boosts the formants slightly to improve 30 count value i and j to both subsystems 702 and 704. 
' the perceptual quality of the coded speech. This is done In subsystem 704 the LSP frequencies are inverse 
by scaling the poles of the formant synthesis filter radi- quantized and interpolated. FIG. 22 illustrates an imple- 
ally toward the origin. However, an all pole postfilter mentation of the inverse quantization portion of subsys- 
generally introduces a spectral tilt which results in muf- tern 704, while the interpolation portion is substantially 
fling of the filtered speech. The spectral tilt of this all 35 identical to that described with reference to FIG. 12. In 
pole postfilter is reduced by adding zeros having the FIG. 22, the inverse quantization portion of subsystem 
same phase angles as the poles but with smaller radii, 704 is comprised of inverse quantizer 750, which is 
resulting in a postfilter of the form: constructed identical to that of inverse quantizer 468 of 

FIG. 12 and operates in a similar manner. The output of 
^ 40 inverse quantizer 750 is provided as one input to adder 
ff ( z) _ J z/p) 0 < p < <r < 1 752. The other input to adder 752 is provided as the 

( } output of multiplier 754. The output of adder 752 is 

where A(z) is the formant prediction filter and the val- P r ™ ded to register 756 where stored and ] output for 

j ' , u „ „ , , _ n • multiplication with the constant 0.9 m multiplier 754. 

ues p and or are the postfilter scaling factors where P « ^ ^ Q P from ^ 752 is ^ ided to £ dder 758 

set to a5, and cr is set to 0 8. whe re the bias value is added back into the LSP fre- 

An adaptive bnghtne^s filter is added to further com- The orderin f the Lsp freque ncies is ensured 

pensate for the spectral tilt introduced by the formant g ^ 16Q M ^ ^ Lsp f des t0 be of 
postfilter. Tlie brightness filter is of the form: a ininimum separation. Generally the need to force 

50 separation does not occur unless an error occurs in 
= 1 - J"" 1 (49) transmission. The LSP frequencies are the interpolated 

\ + Kz~ l as discussed with reference to FIG. 13 and with refer- 

ence to FIGS. 20a-20</ and 21a-21c 
where the value of k (the coefficient of this one tap Referring back to FIG. 19, memory 708 is coupled to 
filter) is determined by the average value of the LSP 55 subsystem 704 for storing previous frame LSPs, 
frequencies which approximates the change in the spec- and may also be used to store the bias values bo>,-. These 
tral tilt of A(z). previous frame values are used in the interpolation for 

To avoid any large gain excursions resulting from all rates. For conditions of blanking, erasure or error 
postfiltering, an AGC loop is implemented to scale the frame with usable data, the previous LSPs o>//_i are 
speech output so that it has roughly the same energy as 60 used in accordance with the chart in FIGS. 2\a-2Xc In 
the non-postfiltered speech. Gain control is accom- response to a blank frame indication from subsystem 
plished by dividing the sum of the squares of the 40 700, subsystem 704 retrieves the previous frame LSP 
filter input samples by the sum of the squares of the 40 frequencies stored in memory 708 for use in the current 
filter output samples to get the inverse filter gain. The frame. In response to an erasure frame indication, sub- 
square root of this gain factor is then smoothed: 65 system 704 again retrieves the previous frame LSP 

frequencies from memory 708 along with the bias values 
Smoothed 0=0.2 current £+0.98 previous £ (50) so as to compute the current frame LSP frequencies as 

discussed above. In performing this computation the 



02/26/2004, EAST version: 1.4.1 



5,414,796 

43 44 

stored bias value is subtracted from the previous frame • ness filter 718. AGC loop 728 is used at the output of 
LSP frequency in an adder, with the result multiplied in formant filter 714 and formant postfilter 716 with out- 
a multiplier by a constant value of 0.9 with this result put thereof multiplied in multiplier 730 with the output 
added in an adder to the stored bias value. In response of adaptive brightness filter 718. The output of multi- 
to an error frame with usable data indication, the LSP 5 plier 730 is the reconstructed speech which is then con- 
frequencies are interpolated as was for full rate if the verted to analog form using known techniques and 
CRC passes. presented to the listener. In the encoders decoder, the 
The LSPs are provided to LSP to LPC transforma- perceptual weighting filter is placed at the output in 
tion subsystem 710 where the LSP frequencies are con- order to update its memories. 

verted back to LPC values. Subsystem 710 is substan- 10 Referring to FIG. 22, further details of the implemen- 
tially identical to LSP to LPC transformation subsys- tation of the decoder itself are illustrated. In FIG. 22 
terns 218 and 228 of FIG. 7 and as described with refer- codebook 722 is comprised of memory 750 similar to 
ence to FIG. 13. The LPC coefficients a/ are then pro- that described with reference to FIG. 17. However for 
vided to both formant filter 714 and formant postfilter purposes of explanation a slightly different approach is 
716. The LSP frequencies are also averaged over the 15 illustrated for memory 750 and the addressing thereof is 
subframe in LSP averager subsystem 712 and provided illustrated in FIG. 22. Codebook 722 is further com- 
to adaptive brightness filter 718 as the value k. prised of switch 752, multiplexer 753 and pseudoran- 
Subsystem 702 receives the parameters I, G, L, and b dom number (PN) generator 754. Switch 752 is respon- 
for the frame from subsystem 700 along with the rate or sive to the codebook index for pointing to the index 
abnormal frame condition indication. Subsystem 702 20 address location of memory 750, as was discussed with 
also receives from subframe counter 706 the j counts for reference to FIG. 17. Memory 750 is a circular memory 
each i count in each decode subframe 1-4. Subsystem with switch 752 pointing to the initial memory location 
702 is also coupled to memory 720 which stores the with the values shifted through the memory for output, 
previous frame values for G, I, L and b for use in abnor- The codebook values are output from memory 750 
mal frame conditions. Subsystem 702 under normal 25 through switch 752 as one input to multiplexer 753. 
frame conditions, except for eighth rate, provides the Multiplexer 753 is responsive to the rates of full, half 
codebook index value I/to codebook 722; the codebook and quarter for providing an output of the values pro- 
gain value G/to multiplier 724; and the pitch lag L and vided through switch 752 to codebook gain amplifier, 
gain b values to pitch filter 726 in accordance with FIG. multiplier 724. Multiplexer 753 is also responsive to the 
20a-20d For eighth rate since there is no value for the 30 eighth rate indication for selecting the output of PN 
codebook index sent, a packet seed which is the 16-bit generator 754 for the output of codebook 722 to multi- 
parameter value (FIG. 2d) for eighth rate is provided to plier 724. ' 

codebook 722 along with a rate indication. For abnor- In order to maintain high voice quality in CELP 

mal frame conditions the values are provided from sub- coding, the encoder and decoder must have the same 

system 702 in accordance with FIGS. 21a-21c. Further- 35 values stored in their internal filter memories. This is 

more for eighth rate, an indication is provided to code- done by transmitting the codebook index, so that the 

book 722 as is discussed with reference to FIG. 23. decoder's and encoder's filters are excited by the same 

In response to a blank frame indication from subsys- sequence of values. However, for the highest speech 
tem 700, subsystem 702 retrieves the previous frame quality these sequences consist of mostly zeroes with 
pitch lag L and gain b values, except the gain is limited 40 some spikes distributed among them. This type of exci- 
to one or less, stored in memory 708 for use in the cur- tation is not optimum for coding background noise, 
rent frame decode subframes. Furthermore no code- In coding background noise, done at the lowest data 
book index I is provided and the codebook gain G is set rate, a pseudorandom sequence may be implemented to 
to zero. In response to an erasure frame indication, excite the filters. In order to ensure that the filter memo- 
subsystem 702 again retrieves the previous frame sub- 45 ries are the same in the encoder and decoder, the two 
frame codebook index from memory 720 and adds in an pseudorandom sequences must be the same. A seed 
adder the value of 89. The previous frame subframe must be transmitted somehow to the receiver decoder, 
codebook gain is multiplied in a multiplier by the con- Since there are no additional bits that could be used to 
stant 0.7 to produce the respective subframe values of send the seed, the transmitted packet bits can be used as 
G. No pitch lag value is provided while the pitch gain 50 the seed, as if they made up a number. This technique 
is set to zero. In response to an error frame with usable can be done because, at the low rate, the exact same 
data indication, the codebook index and gain are used as CELP analysis by synthesis structure to determine the 
in a full rate frame, provided the CRC passes, while no codebook gain and index is used. The difference is that 
pitch lag value is provided and the pitch gain is set to the codebook index is thrown out, and the encoder filter 
zero. 55 memories are instead updated using a pseudorandom 

As discussed with reference to the encoder's decoder sequence. Therefore the seed for the excitation can be 

in the analysis-by-synthesis technique, the codebook determined after the analysis is done. In order to ensure 

index I is used as the initial address for the codebook that the packets themselves do not periodically cycle 

value for output to multiplier 724. The codebook gain between a set of bit patterns, four random bits are in- 

value is multiplied in multiplier 724 with the output 60 serted in the eighth rate packet in place of the codebook 

value from codebook 722 with the result provided to index values. Therefore the packet seed is the 16-bit 

pitch filter 726. Pitch filter 726 uses the input pitch lag value as referenced in FIG. 2d. 

L and gain b values to generate the formant residual PN generator 754 is constructed using well known 

which is output to formant filter 714. In formant filter techniques and may be implemented by various algo- 

714 the LPC coefficients are used in filtering the for- 65 rithms. In the exemplary embodiment the algorithm 

mant residual so as to reconstruct the speech. At the employed is of a nature as described in the article "DSP 

receiver decoder the reconstructed speech is further chips can produce random numbers using proven algo- 

filtered by formant postfilter 716 and adaptive bright- rithm" by Paul Mennen, EDN, Jan. 21, 1991. The trans- 
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mitted bit packet is used as the seed (from subsystem 700 AGC subsystem 728 receives the data from formant 
of FIG. 18) for generating the sequence. In one imple- postfilter 716 and adaptive brightness filter 718 so as to 
mentation the seed is multiplied by the value 521 with scale the speech output energy to about that of the 
the value 259 added thereto. From this resulting value speech input to formant postfilter 716 and adaptive 
the least significant bits are used as a signed 16 bit num- 5 brightness filter 718. AGC subsystem 728 is comprised 
ber. This value is then used as the seed in generating the of multipliers 798, 800, 802 and 804; adders 806, 808 and 
next codebook value. The sequence generated by the 810; register 812, 814 and 816; divider 818; and square 
PN generator is normalized to have a variance of 1. root element 820. The 40 sample output from formant 
Each value output from codebook 722 is multiplied in postfilter 716 is squared in multiplier 798 and summed in 
multiplier 724 by the codebook gain G as provided 10 an accumulator comprised of adder 806 and register 812 
during the decode subframe. This value is provided as t0 pro duce the value "x". Similarly the 40 sample out- 
one input to adder 756 of pitch filter 726. Pitch filter 726 put f rom adaptive brightness filter 718, taken prior to 
is further comprised of multiplier 758 and memory 760. register 790, is squared in multiplier 800 and summed in 
The pitch lag L determines the position of a tap of an accumulator comprised of adder 808 and register 814 
memory 760 that is output to multiplier 758. The output 15 to pro duce the value "y". The value "y" is divided by 
of memory 760 is multiplied in multiplier 758 with the the value » x » in divider 816 to result in the inverse gain 
pitch gain value b with the result output to adder 756. of ^ fllters ^ square root of the gain factor 
The output of adder 756 is provided to an input of mem- is taken in element 818 with the result thereo f 
ory 760 which is a series of delay elements such as a smoo thing operation is accomplished by 
shift register. The values are shifted through memory 20 multip]ying the current value gain q by ±Q constant 
760 (in a direction as indicated by the arrow) and pro- yalue Q 02 m multi lier 802 ^ this resuIt added in 
vided at the selected tap output as determined by the addef 810 tQ the reguh of Q 9g times ^ revious gain as 
value of L. Since the values are shifted through memory c0 ted using register 820 ^ mu iti p ii er 804. The 
760, values older than 143 shifts are discarded. The outputofflIter718isthen multiplied with the smoothed 
output of adder 756 is also provided as an input to for- 25 gam fa multipUer 73Q tQ prQvide the Qutput 

mant ter • reconstructed speech. The output speech is the con- 

The output of adder 756 ,s prowded to one mput of J £ knQwn 

adder 762 of formant filter 714 Format fflter 714 b tech ^ ques for 0 *p Ut t0 the user . 

further comprised of bank of multipliers 764c-764; and T , , . titi ? sw . , A, . - ^ ^ c 

n< /-ru^ ^„*^„* *aa^ n&~> ; e ^^aa^a 1 OM ^ Jt should be understood the embodiment of the pres- 

memory 766. The output of adder 762 is pro video as an 30 . , . , • • u * i 
input to memory 766 which is also constructed as a ent invention as disclosed herein is but an exemplary 
series of tapped delay elements such as a shift register. embodiment and that variations in the embodiment may 
The values aVe shifted through memory 766 (in a direc- be reahzed w . mch ™ ^ctional equivalent. The 
tion as indicated by the arrow) and are dumped at the present invention may be implemented m a digital signal 
end. Each element has a tap which provides the value 35 processor under appropriate program control the pro- 
stored there as an output to a corresponding one of vide the Junctional operation as disclosed herein to 
multipliers 764a-764y. Each one of one of multipliers ^ speech samples and decode the encoded 
764a-764y also receives a respective one of the LPC s P eech - In oth f. r implementations the present invention 
coefficients d-aic for multiplication with the output ^ ^°? ied . m an application specific integrated 
from memory 766. The output from adder 762 is pro- 40 circuit ( ASIC ) ™*% wel1 known very large scale inte- 
nded as an output of fomant filter 714. gration (VLSI) techniques. 

The output of formant filter 714 is provided as an ^ previous description of the preferred embodi- 

input to formant postfilter 716 and AGC subsystem 728. me nts is provided to enable any person skilled in the art 

Formant postfilter 716 is comprised of adders 768 and to make or use the present invention. The various modi- 

770 along with memory 772 and multipliers 45 Nations to these embodiments will be readily apparent 

774<j-774y;776a-776/, 780a-780/; and 782*7-782/ As the to those skilled in the art, and the generic principles 

values are shifted through memory 772 they are output defined herein may be applied to other embodiments 

at the corresponding taps for multiplication with the without the use of the inventive faculty. Thus, the pres- 

scaled LPC coefficient values for summation in adders en t invention is not intended to be limited to the em- 

768 and 770. The output from formant postfilter 716 is 50 bodiments shown herein but is to be accorded the wid- 

provided as an input to adaptive brightness filter 718. est scope consistent with the principles and novel fea- 

Adaptive brightness filter 718 is comprised of adders tures disclosed herein. 

784 and 786, registers 788 and 790, and multipliers 792 We claim: 

and 794. FIG. 24 is a chart illustrating the characteris- 1* A method of speech signal compression, by vari- 

tics of the adaptive brightness filter. The output of for- 55 able rate coding of frames of digitized speech samples, 

mant postfilter 716 is provided as one input to adder 784 comprising the steps of: 

while the other input is provided from the output of determining a level of speech activity for a frame of 

multiplier 792. The output of adder 784 is provided to digitized speech samples; 

register 788 and stored for one cycle and output during selecting an encoding rate from a set of rates based 

the next cycle to multipliers 792 and 794 along with the 60 upon said determined level of speech activity for 

value -k provided from LSP averager 712 of FIG. 19. said frame; 

The output from multipliers 792 and 794, are provided coding said frame according to a coding format of a 

both to adders 784 and 786. The output from adder 786 set of coding formats for said selected rate wherein 

is provided to AGC subsystem 728 and to shift register each rate has a corresponding different coding 

790. Register 790 is used as a delay line to ensure coordi- 65 format and wherein each coding format provides 

nation in the data output from formant filter 714 to for a different plurality of parameter signals repre- 

AGC subsystem 728 and provided to adaptive bright- senting said digitized speech samples in accordance 

ness filter 718 via formant postfilter 716. with a speech model; and 
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generating for said frame a data packet of said param- 
eter signals. 

2. The method of claim 1 wherein said step of deter- 
mining said level of frame speech activity comprises the 
steps of: 

measuring speech activity in said frame of digitized 
speech samples; 

comparing said measured speech activity with at least 
one speech activity threshold level of a predeter- 
mined set of activity threshold levels; and 

adaptively adjusting in response to said comparison at 
least one of said at least one speech activity thresh- 
old levels with respect to a level of activity of a 
previous frame of digitized speech samples. 

3. The method of claim 1 further comprising the steps 
of: 

providing a rate command indicative of a preselected 

encoding rate for said frame; and 
modifying said selected encoding rate to provide said 

preselected encoding rate for coding of said frame 

at said preselected encoding rate. 

4. The method of claim 3 wherein said preselected 
rate is less than a predetermined maximum rate, said 
method further comprising the steps of: 

providing an additional data packet; and 
combining said data packet with said additional data 
packet within a transmission frame for transmis- 
sion. 

5. The method of claim 1 wherein said step of provid- 
ing said data packet of said parameter signals comprises: 

generating a variable number of bits to represent 
linear predictive coefficient (LPC) vector signals 
of said frame of digitized speech samples, wherein 
said variable number of bits representing said LPC 35 
vector signals is responsive to said measured 
speech activity level; 

generating a variable number of bits to represent 
pitch vector signals of said frame of digitized 
speech samples, wherein said variable number of 
bits representing said pitch vector signals is respon- 
sive to said measured speech activity level; and 

generating variable number of bits to represent code- 
book excitation vector signals of said frame of digi- 
tized speech samples, wherein said variable number 45 
of bits representing said codebook excitation vec- 
tor signals is responsive to said measured speech 
activity level. 

6. The method of claim 1 wherein said step coding 
said frame comprises: 

generating for said frame a variable number of linear 
prediction coefficients wherein said variable num- 
ber of said linear prediction coefficients is respon- 
sive to said selected encoding rate; 

generating for said frame a variable number of pitch 
coefficients wherein said variable number of said 
pitch coefficients is responsive to said selected 
encoding rate; and 

generating for said frame a variable number of code- 
book excitation values wherein said variable num- 60 
ber of said codebook excitation values is responsive 
to said selected encoding rate. 

7. The method of claim 1 wherein said step of deter- 
mining a level of speech activity comprises summing 
the squares of the values of said digitized speech sam- 65 
pies. 

8. The method of claim 7 further comprising the step 
of generating error protection bits for said data packet. 
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9. The method of claim 8 wherein said step of gener- 
ating error protection bits for said data packet wherein 
the number of said protection bits is responsive to said 
frame of speech activity level. 

10. The method of claim 1 wherein said step of adap- 
tively adjusting speech activity threshold levels com- 
prises the steps of: 

comparing said measured speech activity to said at 
least one of speech activity thresholds and incre- 
mentally increasing said at least one of speech ac- 
tivity thresholds toward the level of said frame 
speech activity when said frame speech activity 
exceeds said at least one of said speech activity 
thresholds; and 

comparing said measured speech activity to said at 
least one of speech activity thresholds and decreas- 
ing said at least one of speech activity thresholds to 
the level of said frame speech activity when said 
frame speech activity is less than said at least one of 
speech activity thresholds. 

11. The method of claim 10 wherein said step of se- 
lecting an encoding rate is responsive to an external rate 
signal. 

12. The method of claim 8 wherein said step of gener- 
ating error protection for said data packet further com- 
prises determining the values of said error protection 
bits in accordance with a cyclic block code. 

13. The method of claim 1 further comprising the step 
of pre-multiplying said digitized speech samples by a 
predetermined windowing function. 

14. The method of claim 1 further comprising the step 
of converting said LPC coefficients to line spectral pair 
(LSP) values. 

15. The apparatus of claim 1 wherein said input frame 
of digitized samples comprises digitized values for ap- 
proximately twenty milliseconds of speech. 

16. The apparatus of claim 1 wherein said input frame 
of digitized samples comprises approximately 160 digi- 
tized samples. 

17. The apparatus of claim 1 wherein said output data 
packet comprises: 

one hundred and seventy one bits comprised of forty 
bits for LPC data, forty bits for pitch data, eighty 
bits for excitation vector data and eleven bits for 
error protection when said output data rate is full 
rate; 

eighty bits comprised of twenty bits for LPC infor- 
mation, twenty bits for pitch information and forty 
bits for excitation vector data when said output 
data rate is half rate; 

forty bits comprised of ten bits for LPC information, 
ten bits for pitch information and twenty bits for 
excitation vector data when said output data rate is 
quarter rate; and 

sixteen bits comprised of ten bits for LPC information 
and six bits for excitation vector information when 
said output data rate is eighth rate. 

18. An apparatus for compressing an acoustical signal 
into variable rate data comprising: 

means for determining a level of audio activity for an 
input frame of digitized samples of said acoustical 
signal; 

means for selecting an output data rate from a prede- 
termined set of rates based upon said determined 
level of audio activity within said frame; 

means for coding said frame according to a coding 
format of a set of coding formats for said selected 
rate to provide a plurality of parameter signals 
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wherein each rate has a corresponding different 28. The apparatus of claim 18 wherein said set of rates 

coding format with each coding format providing a comprises 16 Kbps, 8 Kbps, 4 Kbps and 2 Kbps. 

different plurality of parameter signals represent- 29. A circuit for compressing an acoustical signal into 

ing said digitized speech samples in accordance variable rate data comprising: 

with a speech model; and 5 a circuit for determining a level of audio activity for 

means for providing for said frame a corresponding an input frame of digitized samples of said acousti- 

data packet at a data rate corresponding to said cal signal; 

selected rate. a circuit for selecting an output data rate from a pre- 

19. The apparatus for compressing an acoustical sig- determined set of rates based upon said determined 

nal of claim 18 wherein said output data packet com- 10 level of audio activity within said frame; 

pri ses: a circuit for coding said frame according to a coding 

a variable number of bits to represent LPC vector format of a set of coding formats for said selected 

signals of said frame of digitized speech samples, rate t0 provide a plurality of parameter signals 

wherein said variable number of bits for represent- wherein each rate has a corresponding different 

ing said LPC vector signals is responsive to said 15 coding format with each coding format providing a 

level of audio activity* different plurality of parameter signals represent- 

a variable number of bils to represent pitch vector digitized speech samples in accordance 

signals of said frame of digitized speech samples, with a speech model; and 

wherein said variable number of bits for represent- a ™ l fo \ Priding for said frame a corresponding 

ing said pitch vector signals is responsive to said 20 di f P* cket at a data rate corresponding to said 

level of audio activity; and rat ?; r , . ~, - . A ^ c 

a variable number of bits to represent codebook exci- . f 0 V? circ ^ of ? lami f wh * rem said CI ' cmt for 

tation vector signals of said frame of digitized determming said level of aumo activity compnses: 

speech samples, wherein said variable number of 25 a ^\JZe ^ 

bits for representing said codebook excitation vec- a ^ for ^ ^ . frame e ^ 

tor signals is responsive to said level of audio activ- ^ at ^ ^ ^ thresholds; ^ 

„ . , . a circuit for providing an indication when said input 

20 The apparatus for compressing an acoustical sig- frame ^ ^ corresponding one of 

nal of claim 18 wherein said means for determining said 3Q ^ at least one audio Mty thresholds, 
level of audio activity comprises: 31. The circuit of claim 30 wherein said circuit for 
means for determining an energy value for said input determining said level of audio activity further corn- 
frame; prises a circuit for adaptively adjusting said at least one 
means for comparing said input frame energy with of said at least one audio act j v i ty thresholds. 

said at least one audio activity thresholds; and 35 3Z ^ circuit of claim 2 g wherein said circuit for 

means for providing an indication when said input determining said energy of said input frame determines 

frame activity exceeds each corresponding one of said energy \, y summing the squares of the values of said 

said at least one audio activity thresholds. digitized samples. 

21. The apparatus of claim 20 further comprising a 33 circuit of 29 wherein said circuit for 
means for adaptively adjusting said at least one of said at 40 determining a level of audio activity determines said 
least one audio activity thresholds. energy by calculating a set of linear predictive coeffici- 

22. The apparatus of claim 18 wherein said means for f or j nput f rame ami determines said level of 
determining said energy of said input frame comprises: audio activity in accordance with at least one of said 

squaring means for squaring said digitized audio sam- linear predictive coefficients. 

pies of a frame; and 45 34. The circuit of claim 29 wherein said input frame 

summing means for summing said squares of digitized Q f digitized samples comprises digitized speech for a 

audio samples of a frame. duration of approximately twenty milliseconds. 

23. The apparatus of claim 18 wherein said means for 35, The circuit of claim 29 wherein said input frame 
determining a level of audio activity comprises: 0 f digitized samples comprises 160 digitized samples. 

means for calculating a set of linear predictive coeffi- 50 36. The circuit of claim 29 further comprising means 

cients for said input frame of digitized samples of for providing error protection bits for said data packet 

said acoustical signals; and responsive to said selected outpuj data rate. 

means for determining said level of audio activity in 37. The circuit of claim 36 further comprises a means 

accordance with at least one of said linear predic- for determining the values of said error protection bits 

tive coefficients. 55 in accordance with a cyclic block code. 

24. The apparatus of claim 18 further comprising 38. The circuit of claim 37 wherein said cyclic block 
means for providing error protection bits for said data code operates in accordance with a generator polyno- 
packet responsive to said selected output data rate. mial of 1 4- x 3 4- x 5 + x 6 4- x 8 -f x 9 -f x I0 . 

25. The apparatus of claim 24 wherein said means for 39. The circuit of claim 29 further comprising means 
providing error protection bits provides the values of 60 for pre-multiplying said digitized samples by a predeter- 
said error protection bits in accordance with a cyclic mined windowing function. 

block code. 40. The circuit claim 39 wherein said predetermined 

26. The apparatus of claim 18 further comprising a windowing function is a Hamming window. 

means for converting said LPC coefficients to line spec- 41. The circuit of claim 29 wherein said set of rates 

tral pair (LSP) values. 65 comprises full rate, half rate, quarter rate and eighth 

27. The apparatus of claim 18 wherein said set of rates rate. 

comprises full rate, half rate, quarter rate and eighth 42. The circuit for claim 29 wherein said set of rates 

rate: comprises 16 Kbps, 8 Kbps, 4 Kbps and 2 Kbps. 
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43. The circuit of claim 29 wherein said output data 
packet comprises: 

a variable number of bits to represent LPC vector 
signals of said frame of digitized speech samples, 
wherein said variable number of bits for represent- 5 
ing said LPC vector signals is responsive to said 
level of audio activity; 

a variable number of bits to represent pitch vector 
signals of said frame of digitized speech samples, 
wherein said variable number of bits for represent- 10 
ing said pitch vector signals is responsive to said 
level of audio activity; and 

a variable number of bits to represent codebook exci- 
tation vector signals of said frame of digitized 
speech samples, wherein said variable number of 
bits for representing said codebook excitation vec- 
tor signals is responsive to said level of audio activ- 
ity. 

44. The circuit of claim 43 wherein said output data ^ 
packet further comprises a variable number of bits for 
error protection, wherein said variable number of bits 
for error protection is responsive to said level of audio 
activity. 

45. The circuit of claim 29 wherein said output data 2 $ 
packet comprises: 

one hundred and seventy one bits comprised of forty 
bits for LPC data, forty bits for pitch data, eighty 
bits for excitation vector data and eleven bits for 
error protection when said output data rate is full 30 
rate; 

eighty bits comprised of twenty bits for LPC infor- 
mation, twenty bits for pitch information and forty 
bits for excitation vector data when said output 
data rate is half rate; 35 

forty bits comprised of ten bits for LPC information, 
ten bits for pitch information and twenty bits for 
excitation vector data when said output data rate is 
quarter rate; and 

sixteen bits comprised of ten bits for LPC information 40 
and six bits for excitation vector information when 
said output data rate is eighth rate. 
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46. The circuit of claim 29 wherein said means for 
selecting an encoding rate is responsive to an external 
rate signal. 

47. The circuit of claim 29 further comprising a 
means for converting said LPC coefficients to line spec- 
tral pair (LSP) values. 

48. A method of speech signal compression by vari- 
able rate coding of frames of digitized speech samples 
comprising the steps of: 

multiplying one frame of digitized speech samples in 
a sequence of said frames of digitized speech sam- 
ples by a windowing function to provide a win- 
dowed frame of speech data; 

calculating a set of autocorrelation coefficients from 
said windowed frame of speech; 

determining an encoding rate from said set of auto- 
correlation coefficients; 

calculating from said set of autocorrelation coeffici- 
ents a set of linear predictive coding (LPC) coeffi- 
cients; 

converting said set of LPC coefficients to a set of line 
spectral pair values; 

quantizing said set of line spectral pair (LSP) coeffici- 
ents in accordance with said rate command and 
said encoding rate; 

selecting a pitch value from a predetermined set of 
pitch values to provide a selected pitch value for 
each pitch subframe in each frame of digitized 
speech; 

quantizing said selected pitch value in accordance 
with said encoding rate and said rate command; 

selecting a codebook value from a predetermined set 
of pitch values to provide a selected pitch value for 
a pitch frame; 

quantizing said selected codebook value in accor- 
dance with said encoding rate and said rate com- 
mand; and 

generating an output data packet comprising said 
quantized line spectral pair values, quantized se- 
lected pitch value, and quantized selected code- 
book value. 

* * * * * 
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